Journal of Applied Psychology 


Joun G. Dartey, Editor 
University of Minnesota 





Table of Contents 


Complex Feedback Displays in a Man-Machine System: H. E. Bamford, Jr. and M. L. 
Aiming, Transfer of Training, and Knowledge of Results: H. C. W. Stockbridge and B. 


Changes in Human Relations Attitudes: A. J. Spector 
Numeral Form as a Variable in Numeral Visibility: R. S. Soar 


Readability of Braille as a Function of Three Spacing Variables: E. Meyers, Doris Ethington, 
and S. Ashcroft 


Intra-Individual Differences in Sensory Channel Preference: B. R. Kay 

A Selection Set Preference Index: R. E. Krug 

Preference for Foods in Relation to Cost: P. H. Benson and D. R. Peryam 
Response Set in Measurement of Food Preference: H. G. Schutz and J. Kamenetzky 


Relations Among Scores on Edwards Personal Preference Schedule, California Psychological 
Inventory, and Strong Vocational Interest Blank for an Industrial Sample: M. D. Dun- 
nette, W. K. Kirchner, and JoAnne DeGidio 


Output Rates Among Coil Winders: H. F. Rothe and C. T. Nye 


ee 8 of Work Satisfaction in the Occupational Choices of College Freshmen: A. W. 
tin 


Gain in Proficiency as a Criterion in Test Validation: W. H. Manning and P. H. DuBois 


A Note on the Reliability and Validity of the Minnesota Scale for Paternal Occupations as an 
Estimate of Family Economic Status: J. L. Holland 


A Hierarchical Factor Analysis of Foreman Behavior: J. A. Creager and F. D. Harding, Jr... .. 


A Machine Method of Computing Guttman’s Coefficient of Reproducibility with a Large 
Sample: R. W. Heath 


A General Device Versus More Specific Devices for Selecting Car Salesmen: J. E. Kennedy.... 


An Empirical Comparison of Two Methods of Test Selection and Weighting: C. H. Lawshe 
and P. J. Patinka om 





American Psychological Association 


Volume 42, Number 3 June, 1958 





Consulting Editors 


Harotp E. Burtt, Ohio State University 

ALPHONSE CHAPANIS, Johns Hopkins Uni- 
versity 

Currrorp E. JurcENsEN, Minneapolis Gas 
Company 

Laurence S. McGavucuran, University of 
Houston 

Qumsn McNemar, Stanford University 


rae 7 Mintz, City College of New 

or 

Haroitp F. Rorue, Fairbanks, Morse and 
Company 

Juan B. Rotter, Ohio State University 

Tuomas A. Ryan, Cornell University 

Donatp E. Super, Columbia University 

Mites A. Tinker, University of Minnesota 

Atrrep C. WetcH, University of New 
Mexico 


ArtHur C. HorrMan, Managing Editor 


Heven Ore, Circulation Manager 
SaraH Womack, Editorial Assistant 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical and consulting psychology, al- 
though a descriptive or theoretical article may be 
accepted if it represents a special contribution in 
an applied field. Quantitative investigations of in- 
terest or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; bio- 
mechanics; industrial working conditions; research 
on opinion and morale factors; job analysis and 
classification research; market and advertising re- 
search. 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 


5 


“brevity consistent with clarity.” The typical 
manuscript should run to approximetely 4,000 
words. There is a lag of approximately twelve 
months between receipt and publication of an 
article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 


Manuscripts should be addressed to the Editor, 
John G. Darley, 408 Johnston Hall, University of 
Minnesota, Minneapolis 14, Minnesota. All manu- 
acripts should be submitted in duplicate. Original 
figures are prepared for publication; duplicate fig- 
ures may be photographic or pencil-drawn copies. 

Manuscripts must conform to the style require- 


ments described in the Publication Manual of the 
American Psychological Association. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 


Washington 6, D. C. 


$8.00 per volume 


Subscriptions, orders, and business communications should be addressed to the American 
1333 Sixteenth St. N.W., Washington 6, D. C. Address changes must reach the 


$1.50 per issue 


Association, 
office by the 10th of 


the month to take effect the following month. Undelivered copies resulting from address changes will not be replaced; 
eee See Oe Se ee ee cee en eee Henne Se Other claims for 


undelivered copies must be made within four months of 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879. 
tants St pee A Se aah ht SD HENNE Ce SNE OO, Sas Ae, P. L. & R. 


of 1948, October 10, 194 


© 1958 by the American Psychological Association, Inc. 





Journal of Applied Psychology 








VoL. 42, No. 3 


JUNE, 1958 








Complex Feedback Displays in a Man-Machine System *’ 


Harold E. Bamford, Jr. and Malcolm L. Ritchie 


Ritchie and Associates, Inc. 


In this paper is reported one of a series of 
studies addressed to the development of effec- 
tive standby instrumentation for the direc- 
tional control of an aircraft in the emergency 
of attitude indicator failure. In a previous 
study of the series (3), it was shown that the 
standard turn indicator was not adequate for 
this purpose. The turn indicator is an in- 
strument which senses, by gyroscopic preces- 
sion, a change in heading, and in whose dis- 
play is indicated the rate of turn. 

It was also shown that the ability of an S 
to accomplish straight and level “flight” in a 
YF 102 flight simulator without reference to 
the attitude indicator is increased by quicken- 
ing the turn indicator display. The quicken- 
ing was accomplished by simulating the ro- 
tation of the instrument’s gyro assembly in 
its mounting. By this means it was possible 
to add to the signal resulting from the air- 
craft’s change of heading (turn) a second 
signal produced by the precession of the 
gyroscope in response to a change of bank 
(roll). 

In order to maintain straight and level 
flight by reference to the quickened turn in- 
dicator, the pilot merely compensated for de- 
flections of the indicator needle from its null 
position by the appropriate operation of his 
aileron control. But a deflection of the needle 
might have resulted from any one or any com- 


1 This article was supported by the United States 
Air Force under Contract No. AF 33(616)-3000, 
monitored by the Aero-Medical Laboratory of 
Wright Air Development Center. Permission is 
granted for reproduction, publication, use, and dis- 
posal, in whole or in part, by and for the United 
States government. 

2 The investigation reported here was carried out 
while the authors were on the staff of the University 
of Illinois. 


bination of four different causes. It might 
have resulted from either roll or turn, and 
either type of rotation might have been the 
consequence of control action or of atmos- 
pheric turbulence. 

In the study referred to, it was possible to 
accomplish a partial filtering of rough air ef- 
fects by increasing the damping of the indi- 
cator needle’s movement. This change re- 
sulted in an improvement of performance over 
that achieved with the standard turn indi- 
cator.. It is the hypothesis of the present 
study that performance can be improved by 
providing distinct display indications of the 
two different control-induced effects—roll and 
turn—just as performance was improved by 
partially separating the control-induced ef- 
fects from those of rough air. 


Method 


The experimental apparatus employed in this study 
consisted of a YF 102 flight simulator, an experi- 
meatal model of an instrument called the integrated 
roll and turn indicator, and a device for measuring 
the performance of the man-machine system of 
which the simulator formed a part. 


The Simulator 


This is an electronic analog computer which con- 
tinuously solves the flight equations of the YF 102 
aircraft. It is equipped with an actual YF 102 cock- 
pit in which the instrument display indications 
change continuously in response to the operator's 
control actions just as they would in actual flight. 
The display indications also vary according to a 
forcing function which simulates atmospheric turbu- 
lence. The intensity of the turbulence can be varied 
arbitrarily, but was set in this experiment at 50% 
of the system’s capacity. The function generator was 
modified so as to provide a continuous rather than 
an intermittent perturbation. This was necessary in 





142 


order that all performance could be measured under 
conditions of similar difficulty, 


The Integrated Roll and Turn Indicator 


This instrument was designed by the authors spe- 
cifically to provide distinct indications to the pilot 
of the two consequences of aileron control action: 
changes in the rate of roll and in the rate of turn. 
It was called an “integrated roll and turn indicator” 
because of the manner in which the two effects are 
indicated. An integrated instrument has been de- 
fined by Ritchie (2) as “a combined instrument in 
which the geometric relationship between two dis- 
played parameters results in a third useful parame- 
ter, all three being visible in the display.” 

As can be seen in Fig. 1, there are two moving in- 
dices in the display of the experimental instrument. 
The Number 1 Index is driven in the same fashion 
as the needle in the standard turn indicator—i.e., by 
the rate-of-turn feedback signal. The signal driving 
the Number 2 Index is obtained by simulating a 10° 
rotation of the turn indicator’s gyro assembly so as 
to produce positive quickening of the display indica- 
tion. This simulation was accomplished by a system 
which was incorporated into the flight simulator 
especially for that purpose. As we have explained 
elsewhere (3), such rotation results in the addition 
of a rate-of-roll signal to the rate-of-turn signal. 
Since the two moving indices indicate, by their re- 
lation to the reference index, rate of turn and rate 
of turn plus rate of roll, the angle between them con- 
stitutes a display indication of rate of roll. Ritchie’s 
definition is accordingly satisfied. 

Direction control of the aircraft by reference to 
this instrument display is very simple, since the angle 
(c) between the two moving indices varies in almost 
immediate response to the control action (see discus- 
sion below). To maintain any constant rate of 
turn, the pilot simply compensates for any deflection 
of the Number 2 Index from the Number 1 Index 
by a movement, in the opposite direction, of his 
aileron control. Straight and level flight is main- 
tained in this fashion when the Number 1 Index is 
aligned with the reference index. 


i a / 
ae / 


be —-/ 
| / 
| 


oe 


The integrated roll and turn indicator. 








Fic. 1. 


Harold E. Bamford, Jr. 


and Malcolm L. Ritchie 


The Measuring Apparatus 


Since the flight simulator is an electronic analog 
computer, it is possible to measure electrical poten- 
tials at different points in it which are at any in- 
stant proportional to the various parameters of the 
flight equations. Apparatus was installed which 
could detect potentials proportional to simulated 
altitude, heading, pitch attitude, and bank attitude. 
The difference between the simulated value of a 
parameter and the corresponding desired value is, by 
definition, the error in that parameter. The output 
signals of the measuring apparatus are potentials 
proportional to the time integrals of the squared 
errors. These potentials are accumulated during a 
thirty-second measurement period, and at the end of 
that period are indicated in the display of a volt- 
meter. 

It is a deficiency of the measuring apparatus that 
it is only below a critical value that the measure- 
ments increase as a linear function of the integral of 
squared error with respect to time. Above that 
value they continue to increase, but the increase is 
negatively accelerated. ’ 


The Experimental Task and Conditions 


Measurements of the performance of the man- 
machine system consisting of the S and the flight 
simulator were obtained under three conditions. In 
each condition the S was required to maintain a 
constant heading and altitude in simulated flight by 
manipulation of the elevon (aileron and elevator) 
control alone. It will be noted that, for a constant 
airspeed, rate of turn is determined by bank attitude, 
which is therefore a more sensitive index of direction 
control than is heading. 

The reference condition. In this condition the full 
instrument panel was available to the S as he per- 
formed his task. ‘The reference condition, which 
simulated nonemergency flight, was included in the 
experiment to provide a base line for the interpreta- 
tion of the experimental findings. 

The control condition. This condition differed 
from the reference condition only in that the atti- 
tude indicator was covered, in simulation of the 
failure of that instrument. The S performed his task 
primarily by reference to the turn indicator, which 
was located at the bottom of the panel, directly be- 
low the heading indicator (see Fig. 2). The latter, 
in whose display was indicated the error which the 
S was to null, was located at the center of the panel. 

The experimental condition. This condition dif- 
fered from the control condition in that the turn 
indicator was covered and the integrated roll and 
turn indicator substituted therefor directly above the 
heading indicator. If the difference in display loca- 
tion was neglected, the only difference between the 
control and experimental conditions lay in the rate- 
of-roll display indication which was provided in the 
latter. Both conditions differed from the reference 
condition in that they simulated the failure of the 
attitude indicator. The experiment was designed to 





Complex Feedback Displays in a Man-Machine System 


permit a comparison between performance measure- 
ments obtained under the control and experimental 
conditions, and to relate the differences to measure- 
ments of nonemergency performance obtained under 
the reference condition. 


Measurement 


Error measurements were made in all four chan- 
nels by means of the apparatus described above. The 
measurements of pitch and altitude error were made 
to insure that the Ss did not neglect their elevation 
control. Our primary interest centered in the direc- 
tion channels, bank and heading. 

The forcing function simulating rough air consti- 
tuted an uncontrolled source of variability in these 
data, since the pattern of perturbation was not con- 
stant for each measurement. However, a sufficient 
number of measurements were made under each of 
the three conditions that we believe the difficulty 
level did not vary systematically. Its variation did, 
however, reduce the precision of our measurements 
to an unknown degree. 


Subjects 


Ten Air Force pilots were tested. Five were quali- 
fied in jet aircraft and five were not. Some of these 
Ss were tested while wearing oxygen equipment and 
breathing pure oxygen because of a simultaneous ex- 
periment in oxygen consumption which was being 
conducted. With one exception, Ss who wore oxygen 
equipment wore it under all three experimental con- 
ditions, and so it is unlikely that any bias was in- 
troduced. In the case of the single exception, the 
oxygen equipment failed during the course of the ex- 
periment. The S, who was not jet qualified, was 
therefore dropped from the sample. 


Procedure 


Each S was given a five minute familiarization 
period in the simulator, during which he manipu- 
lated the controls without direction by the experi- 
menter. Following the familiarization period he was 
allowed three minutes of practice under each of the 
experimental conditions. 

After the familiarization and practice, each S per- 
formed the experimental task during twelve 30-sec. 
trials. Four of these trials were under each of the 
three conditions. The order of conditions was: ref- 
erence, control, experimental, experimental, control, 
reference, reference, control, experimental, experi- 
mental, control, and reference. At the end of each 
30-sec. trial the measurements described above were 
recorded by the experimenter. 


Results and Analysis 


Error indices were computed by extracting 
the square roots of the measurements. Since 
each measurement was an electrical potential 


143 


proportional to the time integral of squared 
error, these indices amount to error root mean 
squares in arbitrary units. 

The extraction of the square root of a 
measurement is legitimate, strictly speaking, 
only for those measurements which do not 
exceed the critical value discussed above. 
However, only a small proportion of the data 
exceed that value. Those which do must be 
regarded as underestimates of the true values, 
but the rank order of the latter is preserved. 
Inspection of the data disclosed little sys- 
tematic association between the affected meas- 
urements and the conditions of the experi- 
ment. In the bank channel two Ss each pro- 
duced two supracritical measurements under 
the control condition and one under the ex- 
perimental condition. Under the control con- 
dition a third S produced the seventh supra- 
critical measurement in this channel. Thus 
the bank criterion is given a slight conserva- 
tive bias—i.e., it is loaded against the ex- 
perimental condition—by the acceptance of 
the measurements at face value. Two pitch 
measurements were affected, one under each 
condition; and one heading measurement un- 
der the control condition and one altitude 
measurement under the experimental condi- 
tion were affected. For these reasons we do 
not believe that any disturbance of the analy- 
sis, beyond a slight reduction in the total 
variability and a slight conservative bias, has 
resulted from our decision to disregard the 
nonlinearity of our data. 


Direction Control 


The effects of the experimental conditions 
on the direction error indices are shown in 
Table 1. The reference condition is clearly 


Table 1 


Mean Error Index for Direction Control Channels 


Condition 


Experi- 
mental 


2.50 
5.46 


Channel Control 


Reference 
Heading 
Bank 


3.58 
7.04 


Note.—Each mean is based on 36 measurements. 





Harold E. Bamford, Jr. 


Table 2 


Analysis of Variance of Heading Error Index 


Source of Variation 


df 





Subjects 

Turn indicators 

Trials 

Turn indicators X trials 
Error 


the most conducive to superior performance 
by either criterion, as we had expected. The 
increase in the mean heading error index which 
attends the loss of the attitude indicator is 
only 16% under the experimental condition, 
whereas an increase of 67% is sustained un- 
der the control condition. In the bank chan- 
nel the corresponding increases are 69% and 
120%. These percentage increases in mean 
error index measure the degradation of direc- 
tion control which ensues when the S is de- 
nied the use of the attitude indicator. 

Analyses of variance were performed on the 
bank and heading error indices obtained un- 
der the control and experimental conditions, 
and they are summarized in Tables 2 and 3. 
As is made apparent in those tables, much of 
the variability of the error indices is the effect 
of individual differences. However, due to 
the design of the experiment, which provided 
that each S be tested four times under each 
condition, this effect does not contaminate our 
estimation of the effects of conditions and 
trials. 

The heading error index shows an effect of 
trials which is significant at the .05 level of 


Table 3 
Analysis of Variance of Bank Error Index 


Source of Variation MS F 


df 





33.33 
44.47 
5.01 
2.33 
4.61 


Subjects 

Turn indicators 

Trials 

Turn indicators X triels 
Error 


and Malcolm L. Ritchie 


confidence, although the bank index reveals 
no such effect. A possible interpretation is 
that the control of heading by reference to its 
derivative was still being learned when the 
measurements were made, while the simpler 
task of controlling the derivatives had already 
been learned. The mean index of heading 
error declined steeply from the first to the 
third trial, and rose slightly on the fourth 
trial, probably as a result of fatigue. 

The percentage increase in the mean index 
of bank error observed under the control con- 
dition is almost twice that observed under the 
experimental condition. The difference is sig- 
nificant at the .005 level of confidence. The 
percentage increase in the mean index of 
heading error observed under the control con- 
dition is more than four times that observed 
under the experimental condition, and this 
difference is significant at the .025 level of 
confidence. The disparity in reliability be- 
tween the two differences is explained by the 
greater sensitivity of bank attitude as an index 
of direction control. 


Elevation Control 


The effects of the three conditions on the 
elevation error indices are shown in Table 4. 
As in the case of direction control, the refer- 
ence condition yields the best performance by 
both criteria. There is no difference between 
the experimental and control conditions with 
respect to the mean index of pitch error. 
Both conditions yield an increase of 90%. 
With respect to the mean index of altitude 
error, however, the experimental condition 
shows an increase of 84%, while the control 
condition shows only a 25% increase. The 
difference between the two percentage in- 


Table 4 


Mean Error Index for Elevation Control Channels 


- Condition 


Experi- 
Channel Reference Control mental 


Altitude 
Pitch 


2.65 
3.10 


1.80 
3.09 





*p < 005. 


Note.—Each mean based on 36 measurements. 





Complex Feedback Displays 


rash on My 


Prag 
ec 


RATE 


ALTITUDE alae OF 
“rns CLIMB 
Panel arrangement under conditions of 
experiment. 





Fic. 2. 


creases is of the same order as that which 
was observed in the heading channel, but in 
the opposite direction.* 

These effects may be due to the panel ar- 
rangement. As can be seen in Fig. 2, the inte- 
grated roll and turn indicator was mounted 
above the regular instrument panel. Thus the 
distance from the integrated roll and turn in- 
dicator to the altimeter or to the rate of climb 
indicator was considerably greater than the 
distance to either of those instruments from 
the turn indicator. If an extended scan dis- 
tance is inimical to precision of control, we 
can account for the fact that the control of 
altitude and only altitude was degraded. 

As has been noted above, pitch and altitude 
errors were measured merely to insure that 
the Ss did not neglect their elevation control. 
The data do not indicate that elevation con- 
trol was neglected, although it was certainly 
degraded under the experimental and con- 
trol conditions. 


Discussion and Conclusions 


The results of this experiment demonstrate 
the favorable effect upon human regulatory 


8A probabilistic assessment of this difference is 
difficult because of the markedly skewed distribu- 
tion of the data. 


in a Man-Machine System 145 
performance of a certain modification in a 
derivative feedback display. (Rate of turn is 
the first derivative of the criterion feedback 
signal, which was indicated in the heading 
indicator display.) 


Analysis of the Experimental Display 


The only essential difference between the 
control condition and the experimental condi- 
tion is that in the latter the rate of roll is in- 
dicated in a feedback display. 

Distinct indication. The favorable effect 
upon system output of a display indication 
of a linear function of rate of roll and rate of 
turn has previously been demonstrated, as 
has the favorable effect of a partial filtering 
from the display indication of the direct ef- 
fects of the system input (3). From the lat- 
ter finding and from the results of the present 
experiment we may venture a generalization. 
A feature common to the experimental dis- 
plays of these two studies is the distinct indi- 
cation of those components of system output 
which resulted from control action. By “dis- 
tinct indication’”’ we mean indication of those 
components in such a way as to enable the 
operator to distinguish them from other con- 
trol-induced effects as well as from the direct 
consequences of the system input. We pro- 
pose that this feature is at least in part re- 
sponsible for the enhancement of the S’s per- 
formance demonstrated in both experiments. 

Anticipatory indication. Clearly the dis- 
play of the integrated roll and turn indicator 
is not the only possible implementation of dis- 
tinct indication. As we have remarked, this 
is an integrated display. Analysis of the way 
in which it is integrated leads to another ab- 
stract feature of the experimental display. It 
is a property of that display that at any in- 
stant the Number 2 Index occupies a position 
which will be occupied by the Number 1 In- 
dex after a definite interval of time if the in- 
dicated rate of roll is maintained during that 
interval. That is to say, the Number 2 Index 
predicts a future rate of turn on the basis of 
the current rate of roll, which is essentially 
the rate of change of the rate of turn. Such 
an arrangement we term anticipatory indica- 
tion. (The effects upon performance of such 





146" 


parameters as the prediction interval and 
the order of extrapolation have yet to be 
studied.) 


Command Effectiveness and Interpretability 


We have previously distinguished two criti- 
cal psychological properties of instrument dis- 
plays, which we have defined in the following 
way: 


... the command effectiveness of a display is 
measured by a subject’s ability to discriminate the 
commanded response from alternative responses. 
And the interpretability of a display is measured by 
a subject’s ability to discriminate the designated con- 
ditions from alternative designata (1). 


In this framework it is clear that the indi- 
cation in a turn indicator commands behavior 
which will maintain the aircraft in straight 
and level flight, and that it designates the rate 
of turn of the aircraft. The effectiveness of 
the command was increased by quickening 
the display (3), but it is evident that this 
was accomplished only at a cost in inter- 
pretability. Quickening aided the S in his 
discrimination of the commanded control ac- 
tion, but his ability to discriminate the rate 
of turn was almost destroyed. He was ren- 
dered cognitively blind. 

By anticipatory indication in the integrated 
roll and turn indicator, on the other hand, we 
were able to achieve a similar increase in com- 
mand effectiveness, while preserving the turn 
indicator’s interpretability by the distinct in- 
dication of roll rate and turn rate. (It will 
be recalled that the Number 2 Index in the 
integrated roll and turn indicator is driven by 
the quickened-rate-of-turn signal.) In fact, 
the experimental instrument more comprehen- 
sively designates the system output than did 
the unmodified turn indicator. As a conse- 
quence, the pilot is enabled to adapt to more 
aspects of the total situation. Instead of be- 
ing curtailed, his domain of decision is en- 
larged. 


Two Principles of Display Design 


We have defined a principle of display de- 
sign as “the joint predication of psychologi- 
cal and physical properties of a display—i.e., 


Harold E. Bamford, Jr. 


and Malcolm L. Ritchie 


the assertion of a relation between psycho- 
logical and physical properties” (1). Using 
that formula, we may now restate the two 
principles suggested in the analysis of the ex- 
perimental display: 

1. The command effectiveness of a feed- 
back display is increased by the distinct in- 
dication therein of control-induced compo- 
nents of the system output. 

2. The command effectiveness of a feed- 
back display is increased by the anticipatory 
indication therein of the feedback signal. 

Since the integrated roll and turn indicator 
incorporated both distinct indication and an- 
ticipatory indication, its demonstrated superi- 
ority to the control instrument cannot defini- 
tively establish either of these principles. By 
the same token, however, both principles are 
supported by the experimental results. In 
addition, the principle of distinct indication 
receives support from the effect of damping 
the movement of the indicator needle (3) as 
discussed above. 


Summary 


1. Performance of nine Air Force pilots in 
simulated flight was studied under three con- 
ditions: reference, control, and experimental. 

2. Direction control under the experimental 
condition was reliably superior to perform- 
ance under the control condition. Perform- 
ance under both experimental condition and 
control condition was inferior to that under 
the reference condition. This is a conserva- 
tive conclusion because of a conservative bias 
in the criterion. 

3. The experimental display (the integrated 
roll and turn indicator) is shown to be charac- 
terized by distinct indication of the control- 
induced components of system output, and by 
anticipatory indication of the derivative feed- 
back signal. 

4. It is argued that quickening increases 
the command effectiveness of a feedback sig- 
nal at a cost in interpretability. The inte- 
grated roll and turn indicator achieves a simi- 
lar gain in command effectiveness by antici- 
patory indication and by distinct indication 
prevents a loss in interpretability. 





Complex Feedback Displays in a Man-Machine System 


5. Two principles of display design are sup- References 
ported by the data: Bamford, H. E., & Ritchie, M. L. The evaluation 


a. The command effectiveness of a feed- of instrument displays: A point of view. In 


ei, he D — ° M. L. Ritchie & C. A. Baker (Eds.), Psycho- 
back display is increased by the distinct in- lenledl dipeste off cathelt Gudiah—t erugedian 


dication therein of control-induced compo- report. USAF WADC Tech. Rep., 1957, No 
nents of the system output. 57-117, 64-72 


: vee s 2. Ritchie, M. L. Integrated instruments: a drag 
b. The command effectiveness of a feed- indicator. USAF WADC Teck. Rep, 1985, 


back display is increased by anticipatory in- No. 55-423. 

dication therein of the feedback signal. 3. Ritchie, M. L., & Bamford, H. E. Quickening 
and damping a feedback display. J. appl. 

Received July 1, 1957. Psychol., 1957, 41, 395-402. 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Aiming, Transfer of Training, and Knowledge of Results ° 


H. C. W. Stockbridge and B. Chambers 
Ministry of Supply, United Kingdom 


Knowledge of results has been shown to be 
necessary to learning (6,7). There are, how- 
ever, frequently occasions on which additional 
knowledge of results may be given in order to 
improve performance. A modification of the 
apparatus employed may give extra informa- 
tion in the same sensory mode as the stimulus 
or through another channel. Thus the sound 
of a buzzer may show when S is on target (3), 
or Ss may learn from a counter what their 
production has been (2). 

In both these examples the additional 
knowledge of results was achieved by a modi- 
fication of the apparatus. The important dif- 
ference between the two examples, however, 
is that in the first, the aiming apparatus is a 
synthetic trainer and in the real situation the 
modification giving knowledge of results must 
be removed; in the second example the incen- 
tive device may be permanently retained. 

The present experiment is an investigation 
of the effect of knowledge of results on syn- 
thetic training. The experiment takes the 
form of an inquiry into transfer of training 
using an aiming task with a moving target. 
A control group and a training group of Ss 
were used and the rates of learning and for- 
getting of both groups of Ss were computed. 
The type of knowledge of results used was 
both random and intermittent in an attempt 
to deter the S from relying too heavily on 
cues which would not be present in the real 
situation and to retain his interest in the task 
by an element of uncertainty. 


Method 
Apparatus 


A British Army rifle No. 4 Mk. 1, pivoted be- 
neath the magazine, could be aimed at a black target, 


1A fuller account may be found in the original 
M.A. thesis, filed in the University of Reading Li- 
brary, England. The authors wish to thank R. C. 
Oldfield for his guidance, J. Draper for statistical 
advice and J. Eccleston for the computations. The 
British Crown Copyright of this paper is reserved. 
It is published with the permission of H.B.M. Sta- 
tionery Office. 


148 


1 in. X 0.4 in., pasted to the recording belt of a ky- 
mograph 111 in. from the backsight of the rifle. The 
target moved from the S’s right to his left at ap- 
proximately 100 mm. per second (3.94 in. per sec.), 
and was visible for about 26 in., or 6.6 sec. A screen 
defined the field of view for the S. This screen was 
54 in. from the recording band of the kymograph 
and for normal shooting an opening of 24 in. was 
used. Before and after the main training of Groups 
2 through 5 inclusive, however, an opaque “hedge” 
was put up on this screen, through which bullets 
might be imagined to pass, which obscured all but 
the first and last three inches of the target’s path. 

In more general terms the target’s length subtended 
33’ of arc at the backsight, and moved at 2° 10’ of 
arc a second across a field of visability of 13° 8’ of 
arc. When the “hedge” was erected the target was 
only visible for 1° 38’ of arc at the beginning and 
end of its transit. 

The speed of the target corresponds to a 27 ft. ve- 
hicle at a range of 1,000 yd. moving at a speed of 
74 m.p.h. 

The method of scoring was electrical. A point a 
length ahead of the target was said to be the correct 





CONTACTS 


“BEHIND” 
CIRCUITS 








KYMOGRAPH CONTACTS 


| 








1 2 3 4 5 


Fic. 1. The electrical scoring system 





Aiming, Transfer of Training, and Knowledge of Results 


point of aim. The time in a zone a length to either 
side of the point of aim was measured. By means 
of a counter system a histogram was recorded of the 
contacts made by the 26 sampling points. Three 
banks of counters gave an indication of time spent 
“on,” “ahead,” or “behind” the target. Figure 1 
gives a schematic diagram of the circuits used to ob- 
tain the rough estimate of accuracy of tracking. 
These circuits embody a principle possibly of use in 
tasks involving sequence analysis. In its fullest de- 
velopment the principle is roughly described by say- 
ing that everything is connected to everything else, 
thus the number of circuits rises as the square of the 
pairs of switches. Such circuits may also be used to 
record the results of paired comparisons. 

Knowledge of results was given by the clicking of 
a relay. This click was not heard on every run but 
on 5 out of 10 runs at random during those 10 runs 
and the knowledge of results given may thus be de- 
scribed as random and intermittent. 


Design and Procedure 


The general design of the experiment was gov- 
erned by a latin square. Six men were tested each 
fortnight and a total of 30 Ss underwent the ex- 
periment. 

The first group of Ss were research workers, the 
remaining four groups were private soldiers in their 
early twenties including pioneers, drivers, cooks and 
parachutists. 

The Monday of the first week of an S’s stay was 
spent in allotting him to either the “knowledge” or 
the “no knowledge” group. The Ss were taken in a 
random order, and no S was given knowledge of re- 
sults for these matching runs. Each S carried out 
two groups of 50 runs, one group in the morning 
and one in the afternoon. Scores of the time spent 
in the zone and of “contacts” on the correct point 
of aim were taken every 10 runs. To divide the 30 
men into two equal groups of equal ability, each six 
men were matched on their absolute score both in 
terms of time in zone and contacts. They were also 
matched on their rate of learning. An analysis of 
variance showed that there was no difference be- 
tween the two groups of 15 men, or between any 
pair of three men comprising the five subgroups of 
six men each. 

Considering again a group of six men split into 
two for “knowledge” and “no knowledge” treatment, 
the morning of the second day was spent in aiming 
without a target while the matching statistics were 
being computed. For aiming without a target, or 
firing through a hedge (as it was expressed to the 
Ss), a record of contacts was made, both on, ahead 
and behind the target at the 15th and 30th run, and 
the time in zone was noted every 5th run; 30 runs 
in all were completed by each S in this condition. 

The main part of the trial began on the afternoon 
of the second day, each S completing his 50 runs in 
two lots of 25. Scoring of time in zone was after 
every 10th run throughout the 800 runs required of 
each S in this condition. Scoring of the number of 


149 


contacts summed on the three banks of counters on 
the recording console was undertaken after every 50 
runs up to 250 runs when the preliminary rapid 
learning period might be thought to be complete and 
from 250 to 800 every 25 runs. While the scores 
were being copied from the counters on to the 
mimeographed record sheets, the Ss rested. No com- 
plaints were received of 25 runs being excessively 
fatiguing. The time of day at which S was asked 
to perform varied considerably in order to avoid any 
possible diurnal effects. 

The Ss were asked to aim half a length ahead of 
the front of the target, and they were told that they 
would be scored on this, but that knowledge of re- 
sults would be given by a click when they were in 
a zone of a length of the target to either side of the 
correct point of aim. 

After the 400th run, which occurred half way 
through the second Monday of the trial, the knowl- 
edge of results group were not given any knowledge 
of results in order to see whether they would retain 
any skill they might have acquired. The first half 
of the main trial was thus concerned with remem- 
bering and the second half with forgetting. The Ss 
were again asked to aim without a target at the end 
of the trial in case a differential transfer effect be- 
tween the two groups might be significant 


Results 


Figure 2 shows time in zone for all groups 
at nine periods of testing. To find whether 
there were significant differences between the 
knowledge and no knowledge subgroup, a se- 
ries of spot tests were made on these nine 
periods for both methods of scoring. The 
method used amounted to a ¢ test but took 
the form of an analysis of variance in order 
that the between groups effect could be elimi- 
nated, since this might be due not only to dif- 
ferences between groups of men but also to 


KNOWLEDGE 








MEAN TIME IN ZONE PER 10 RUNS — SECS 


NUMBER OF RUNS — HUNDREDS 


Fic. 2. Time in zone. Five groups, 30 men. 





150 


alterations of the characteristics of the ap- 
paratus. 

A difference significant at the 1% level was 
found between the knowledge and the no 
knowledge subgroups at the 275-300 runs pe- 
riod of testing (Table 1) while at the 375— 
400 period, the significance of the difference 
was (P < 5%). No other significant differ- 
ences were found. In both these cases differ- 
ences were only found in time in zone data. 
It will be remembered that knowledge of re- 
sults when given was of whether the S was in 
the zone or not. In neither of the analyses 
mentioned above was the Groups X Knowl- 
edge interaction significant, suggesting that 
the effect was consistent over all five groups. 

To check whether there were any effects of 
a cyclic nature within the two-week units into 
which the trial was split, 16 sample periods of 
50 runs each were chosen in Groups 2, 3, and 
4. Correlation coefficients were calculated 
for the knowledge and the no knowledge 
groups separately (using the 16 sample points 
referred to above), between Groups 2 and 3, 
Groups 3 and 4, and between Groups 2 and 4. 
None of the coefficients calculated was signifi- 
cant, and it was concluded that there was no 
evidence for all three groups performing bet- 
ter on any particular part of any day of the 
trial due, for example, to an end spurt, rapid 
initial learning, recovery from a week-end ef- 
fect or owing to fatigue being less at the be- 
ginning of the day. For these calculations, 
times in the zone as a mean over 10 runs were 
used. 

Results for the aiming without a target ex- 
periment for both time in zone and the num- 
ber of contacts on target were both analysed. 


Table 1 


Analysis of Variance of the Time in Zone 





Time for Runs 
275-300 


Time for Runs 
375-400 
Mean 
af Square Sig. 


Mean 


Source Square Sig. 





Between K & NK 
Between Groups 4 
Groups XK 4 
Residual 20 


1 18.8972 P<1% 
2.1054 Not 
3.1521. Not 
2.1453 


7.9053 
12.6152 
0.6488 
1,6978 


Total 2 








H. C. W. Stockbridge and B. Chambers 


In both cases results were obtained before 
and after the main trial and, although no 
knowledge of results was obtainable by the 
Ss, yet it was thought that there might be 
some influence of the knowledge given in the 
main trial on results taken after this training 
period: knowledge and no knowledge groups 
will continue to be distinguished in the aim- 
ing without a target condition despite the fact 
that no knowledge was ever given. 

Considering time in zone scores then, the 
effects of interest are “before,” “after,” 
“knowledge” and “no knowledge,” to use a 
convenient abbreviated form of the names of 
these conditions. Analysis of variance shows 
no significant differences between knowledge 
and no knowledge averaged over “before” and 
“after,” but the “after” mean time in the 
zone is significantly higher (P < 5%) than 
the “before” scores, indicating that some 
transfer had taken place from the main trial 
to the no target condition, and that learning 
had taken place. If the two “knowledge” 
conditions are referred to as Groups and the 
“before” and “after” conditions as Time, then 
the Group X Time interaction is not signifi- 
cant; this shows that the learning was not 
significantly different between the two groups. 
The mean difference between the before and 
after tests is 1.36 ++ 0.58 sec./10 runs. 

Table 2 shows further analyses of the aim- 
ing without a target or “hedge” data. Here 
the “on” contacts of the first and last five 
counters, and of the first counter and last two 
counters were analysed, the latter condition 
corresponding to the periods when the target 
was in view. From both these analyses it is 
clear that the 800 training runs improved an 
S’s ability to anticipate the reappearance of 
the target. The higher score on the first as 
opposed to the last counters (the Position ef- 
fect) is perhaps explained by the Ss being 
able to leave the rifle aimed at the begin- 
ning of the target’s path. They then swing 
smoothly to the left and might easily arrive 
late at the left hand opening. 

A more sophisticated method of investigat- 
ing the effect of knowledge is to compare 
knowledge and no knowledge scores on the 
basis of “after” scores, as if the “before” re- 
sults had been the same for both conditions 





Aiming, Transfer of Training, and Knowledge of Results 


Table 2 


“Hedge”? Runs (Analysis of Variance of Contacts “On’’) 


First and Last Five Counters 


Mean 

Source df Square 
591.1 

243.3 
1,383.5 

* 155.1 
2,717.4 
35,688.5 
565.0 

91.6 

4.0 

77.1 


169.4 


Groups (G) 3 
Knowledge (K) 1 
GXK 3 
Ss within G & K 15 
Training (T) 1 
Position (P) 1 
TXP 1 
KXT 1 
KXP 1 
KXTXP 1 
Residual 63 


Total 91 


* One subject was not available for “hedge” runs at the end of the experiment. 


interactions found not to differ significantly 


of knowledge. The technique used to achieve 
this is analysis of covariance which effectively 
compares the rates of learning under the two 
conditions of knowledge. The simple analysis 
of variance also estimated the rates of learn- 
ing as described in the previous paragraph, 
but the “before” results under the two con- 
ditions, although not significantly different, 
were not identical, and in these circumstances 
analysis of covariance is the more appropriate 
method to use. This showed no significant 
difference between “after” results of the 
knowledge and no knowledge conditions. The 
adjusted “after” means of 10 runs obtained 
for this analysis were 9.86 sec. for the knowl- 
edge group, and 8.51 sec. for the no knowl- 
edge group, while the standard error of the 
differences between means was 0.87. 
Analysis of the number of contacts “on” 
was undertaken in the same way as the time 
in zone by both analysis of variance and 
analysis of covariance. Analysis of vari- 
ance showed no significant difference between 
knowledge and no knowledge groups aver- 
aged over “before” and “after” times, nor be- 
tween “before” and “after” times averaged 
over the knowledge and no knowledge groups. 
As before, the Group < Time interaction was 
not significant. The mean difference between 


P<5% 
P<0.5% 
P<O01% 


P<O1% 
Not (P <10%) 


First Counter and Last 
Two Counters 


Mean 
Sig. Square Sig. 


Not (P<10%) 
Not 
Not 


55.63 
10.09 
31.56 
19.38 
160.46 
2,023.83 
10.11 
0.25 
0.36 
9.91 
15.60 


Not 


P<0.5% 
P<01% 
Not 
Not 
Not 
Not 


Not 
Not 
Not 


The residual term is a compound of all other 


“after” and “before” is 9.3 + 5.84 contacts, 
10 runs. 

Analysis of covariance showed no signifi- 
cant difference between knowledge and no 
knowledge groups on the basis of “after” re- 
sults with “before” results held constant. 
The mean number of contacts was 73.2 for 
the knowledge group and 74.4 for the no 
knowledge group while the standard error of 
the difference between the means was 9.39. 

It will be noted that the analysis of results 
from the two scoring systems do not agree, 
since the time in zone showed a significant 
difference between “before” and “after” while 
the number of contacts “on” did not. Con- 
sideration of the apparatus suggests that the 
time in zone scores, which have the additional 
advantage of being absolute, may reasonably 
be considered to be the more accurate. 

Straight lines were fitted to the results of 
time in zone (see Fig. 2) with the abscissa 
being the number of runs grouped in 50’s. 
Lines were then fitted from runs 0-400 and 
from 400-800. The mean lines fitted for each 
group were then compared with respect to 
variation of the lines for the individual men 
within groups. All slopes were positive show- 
ing that continuous learning was taking place. 

No significant difference was found between 





152 


the slopes of the knowledge and the no knowl- 
edge lines measured over 0-400 and 400-800; 
there was also no significant difference be- 
tween the scores on runs 0-400 and 400-800 
averaged over knowledge and no knowledge. 
The apparent levelling off in the learning 
curve after 400 runs had been completed, 
especially in the case of the knowledge group, 
was not significant for either group. 

A comparison was made of time in zone 
and of contacts “on” between aiming with 
and without a target. As might be expected 
it is easier to aim with than without a target; 
all differences were significant. Thus the 
three ascending levels of information sup- 
plied, i.e., no target, target and no knowledge, 
and target with knowledge resulted, after 
training, in three significantly different and 
ascending levels of performance. 

It is of interest to compare the ratios of 
“ahead,” “on,” and “behind” contacts for 
both the knowledge and the no knowledge 
groups and for runs 0-100, 100—200, etc. as 
learning progresses. Results for the third 
group of six men have been selected and their 
“ahead,” “on,” “behind” numbers of contacts 
expressed as percentages of “ahead” + “on” 
+ “behind” contacts for the series of runs 0—- 
100, 100-200, etc. This eliminates any total 
differences between knowledge and no knowl- 
edge and between the runs. It is convenient 
to refer to “ahead,” “on,” “behind” as Posi- 
tions. This analysis is shown in Tables 3 
and 4. 

The significance of the Runs within Posi- 
tions term (P < 5%), indicates that the pat- 
tern of the ratios of “ahead” to “on” to “be- 


Table 3 


Analysis of Variance. Percentage Number of Contacts 
“On,” “Ahead” and “Behind” for Group 3 
(Six Men) 








Mean 
df Square 





Position 2 
Runs within position 21 
Groups within position 3 
Residual 117 


Total 143 


77,309.03 
114.29 
415.91 

63.32 





H. C. W. Stockbridge and B. Chambers 


Table 4 


Mean Percentage Number of Contacts 


“Ahead” “On” “Behind” 





79.9 
79.4 


Knowledge 74 
No knowledge 14.8 


Note.—The S.E. of the difference between knowledge and no 
knowledge means in one position is 2.30. 


hind” averaged over both knowledge and no 
knowledge, is changing significantly between 
the runs. As the trial proceeded, Ss learned 
to aim correctly as shown by the increasing 
percentage of contacts “on” observable in the 
data. 

The Groups within Positions term is signifi- 
cant (P < 0.1%), and this suggests that av- 
eraged over all runs, the ratios of “on” to 
“ahead” to “behind” percentage contacts are 
different for “knowledge” and “no knowledge”’ 
as is clearer in Table 4. From this table it is 
clear that additional knowledge of results 
changed a slight tendency to aim ahead of 
the point of aim, into a slight tendency to aim 
behind. . 

The Groups X Runs interaction within Po- 
sitions is not significant showing that the dif- 
ference in ratios of “on” to “ahead” to “be- 
hind” for knowledge and no knowledge is con- 
sistent over the Runs. 


Discussion 


Improved apparatus made it possible to ask 
several questions suggested by a pilot experi- 
ment. The knowledge of results provided, 
while random and intermittent and less spe- 
cific (since it gave knowledge of time in the 
zone and not time on the correct point of aim 
as in the previous experiment), proved on two 
occasions a significant help to the group who 
were given it, the average improvement being 
about 12%. This percentage is half the im- 
provement found in the previous experiment 
where Ss were given twice the additional 
knowledge. 

The effect of removing extra knowledge of 
results was also studied. There are two sig- 
nificant differences between the knowledge 
and no knowledge groups just before the 
knowledge was stopped and none after, but 





Aiming, Transfer of Training, and Knowledge of Results 


there are no significant differences between 
rates of learning. 

The Ss learned to swing smoothly and 
steadily when engaged on the easier task of 
aiming at a target, as is shown by their scores’ 
being better on the aiming without a target 
situation after training than before, although 
this was only significant for time in zone. 
Lincoln (5) also carried out an experiment on 
learning a rate of movement, a task which 
can be performed electronically by a velodyne 
and which offers an interesting analogy in- 
cluding a mathematical treatment. Lincoln 
concluded that ‘extra knowledge of results 
provided by the visual aid was a detriment 
during the criterion trials.” This finding is 
not supported by the results of the present 
experiment, since, under the condition of aim- 
ing without a target, after training with a 
target, the analysis of covariance did not 
show that the improvement of the knowledge 
group was significantly different from that of 
the no knowledge group. The conditions of 


the two trials were, however, markedly dif- 
ferent. 

Elwell and Grindley (1) using a lower level 
of information than in the present experiment 


suggest that knowledge of results operates by 
causing a tendency to repeat movements lead- 
ing to a satisfactory result, by a directive ef- 
fect and by an effect on motivation. They 
also stress the importance of such knowledge 
on maintaining a skill. 

The use of random intermittent knowledge 
of results was partly due to the opinion of 
Jenkins and Stanley (4) that knowledge 
gained under a partial reinforcement regime 
is less rapidly forgotten than under a condi- 
tion of total knowledge of results, but it was 
also due to a desire to investigate the findings 
of Goldstein and Rittenhouse (3) which ap- 
peared to disagree with the opinions of the 
former workers. Goldstein and Rittenhouse 
did not investigate whether partial led to as 
rapid extinction as total reinforcement. The 
present experiment offers no evidence in favor 
of intermittent knowledge of results, i.e., 
partial reinforcement (nor can any compari- 
sons be made between various forms of knowl- 


153 


edge of results) on the criterion of resistance 
to extinction. 

The practical conclusion to be drawn from 
these findings is that where artificial knowl- 
edge of results of this type is to be removed 
in the real situation there is no justification 
for providing it in the training period. The 
findings also suggest that the level of perform- 
ance attained in a skill is partly related to the 
amount of information available during the 
performance of that skill. 


Summary 


A minimum of knowledge of results is nec- 
essary for learning. The effect of additional 
knowledge of results on learning to aim at 
moving targets using a synthetic trainer is de- 
scribed. Thirty Ss were tested for two half- 
hourly periods on each of 10 days. Fifteen 
of these Ss received random intermittent 
knowledge of results for the first five days 
only. Analysis of the results showed that 
there were significant differences between the 
groups towards the end of the training period 
but these differences were no longer signifi- 
cant when the additional knowledge of results 
was removed. 


Received May 21, 1957. 


References 


. Elwell, J. L., & Grindley, G. C. The effect of 
knowledge of results on learning and perform- 
ance. Brit. J. Psychol., 1938, 24, 39-53. 

. Gibbs, C. B., & Brown, I. D. Unpublished manu- 
script, Applied Psychology Research Unit, 
Medical Research Center, Cambridge, Eng- 
land, 1955. 

. Goldstein, M., & Rittenhouse, C. H. Knowledge 
of results in the acquisition and transfer of a 
gunnery skill. J. exp. Psychol., 1954, 48, 187- 
196. 

. Jenkins, W. O., & Stanley, J. C. Partial rein- 
forcement: A review and a critique. Psychol. 
Bull., 1950, 47, 193-234. 

. Lincoln, R. S. Learning a rate of movement. J. 
exp. Psychol., 1954, 47, 465-470. 

. Thorndike, E. L. The fundamentals of learning. 
New York: Columbia University, 1932. 

. Trowbridge, M. H., & Cason, H. An experi- 
mental study of Thorndike’s theory of learn- 
ing. J. gen. Psychol., 1932, 7, 245-260. 





Journal o ae ag? Psychology 
Vol. 42, , 1958 


Changes in Human Relations Attitudes 


Aaron J. Spector 


U.S. Naval Personnel Field Activity, Washington, D. C2 


A unique feature of the AFROTC’s seminar 
in leadership and management is the stress 
laid upon attitudinal, rather than cognitive, 
changes in the students as a function of new 
information acquired in lectures, insights 
gained by role-playing related to human re- 
lations problems, and integration of ideas 
through group discussion. To further this 
end a new textbook was prepared and intro- 
duced in 1954 (2). 

Since, in the past, only moderate success 
has been achieved in attempts to induce atti- 
tudinal changes experimentally among college 
students—coupled with the fact that there is 
a paucity of published research indicating 
that human relations training programs do 
any more than make the students feel happy 
they attended the course—the O.E.R.L. was 
amenable to conducting a study of the course’s 
effectiveness in achieving its objective, viz., 
attitude changes, although by necessity the 
research design left much to be desired. The 
evaluation study is herein reported. 


The Evaluation Instrument 


In the early stages of constructing the Atti- 
tudes Test in Human Relations (4), items 
were assembled from related instruments, 
from suggestions of Air Force personnel, and 
from a priori considerations of the project 
officer. Upon review of the items by six hu- 
man relations experts and by several educa- 
tional specialists of Air University, 161 of the 
original items were judged to be appropriate 
content material for a test of this nature. 
The items were then assembled into Form i 
of ATHURE, on which the respondents were 
instructed to answer each question by check- 
ing, on a scale, one of the following responses: 

1 The research reported here was conducted while 
the author was employed at the Officer Education 

Research Laboratory, Maxwell Air Force Base, Ala- 
bama. Personal views or opinions expressed or im- 
plied in this publication are not to be construed as 
necessarily carrying the official sanction of the De- 


partment of the Air Force or the Air Research and 
Development Command. 


154 


“disagree completely,” 
“agree partly,” 


“disagree partly,”’ 
r “agree completely.” 


The Criterion 


ATHURE Form i had been previously ad- 
ministered to 494 Air Force officers attend- 
ing the Command and Staff School (CSS), 
class of 1955.2 Additional data concerning 
the CSS students were obtained from the 
Officer Behavior Description (OBD), an ob- 
jective instrument which described the sub- 
ject officer’s human relations behavior “on the 
job” (3). The upper 27% (N = 76) of offi- 
cers on this measure was selected as the cri- 
terion group for the present study, since it 
was presumed that officers who had urusu- 
ally good human relations behavior “on the 
job” were guided by the same attitudes as 
the Leadership and Management Seminar was 
trying to induce in the AFROTC cadets. 


Procedure 


Sample. Ten AFROTC detachments were assigned 
by Hq. AFROTC to take part in this evaluation 
study. Although the detachments were selected on 
an “availability,” rather than a random or repre- 
sentative basis there is no reason to believe they 
differed from the population of detachments on any 
important variables. Each detachment was requested 
to administer the ATHURE Form i to one section 
of the Leadership and Management Seminar with 
the stipulation that the selected section contain be- 
tween 20 and 30 students. 

Method. The Leadership and Management Semi- 
nar is required for all senior cadets; during the school 
year 1954-55 it was offered at the beginning of the 
year. Since this evaluation study was not proposed 
until after the students had registered, it was not 
possible to randomize students, to provide control 
groups, or to rearrange the curriculum. Despite the 
obvious limitations of an uncontrolled study, which 
was the only practicable course of action under these 
conditions, it was felt that the findings could still be 
of some value if interpreted cautiously. 

Copies of ATHURE Form i were mailed to the 
participating detachments with instructions for ad- 


2 On the basis of these data “preference” and “dis- 
crimination” indices were obtained and a_ forced- 
choice test developed (4). 





Changes in Human Relations Attitudes 


ministering them at the beginning and end of the 
seminar. 


Results 


The items were dichotomized into “agree’’ 
and “disagree” categories and comparisons 
were made between the percentage of “dis- 
agree” responses of the criterion group and 
the combined AFROTC detachments. For 
the analyses discussed below, the only items 
examined were those on which there was a 
difference of at least 10% between the pre- 
test and the criterion group’s “disagree” re- 
sponses.* The attitudes reflected by the 39 
items which met this requirement were con, 
sidered to be susceptible to improvement by 
training.* 

Table 1 shows the percentage disagree for 
the pretest, posttest, and the criterion group; 
Columns 5 and 6 indicate the direction of 
attitude change, whether in the direction of 
the criterion group’s attitude or away from 
it. Of the 39 items, 35 show movement to- 
ward the criterion group responses. 

Greater attitudinal changes were evidenced 
on those items where the differences were 
greatest between the pretest and the criterion. 
For example, on every item where the differ- 
ence was greater than 50%, the change from 
pre- to posttest was greater than 50%. 

The data were then examined to identify 
items on which the following differences were 


3 Here the formula 
Pi — P2 

PO: , PQs 
\ Ny + Nz 
was used to estimate the percentage difference needed 
for significance. With the n’s of these samples a differ- 
ence of 12% is significant at the .05 level for two 
percentages in the 40%-60% range. For the same 
significance level the difference diminishes as the per- 
centage distribution extremes are approached until only 
a difference of 8% is required in the decile at each 
extreme. Consequently, a 10% difference seemed ac- 
ceptable throughout the distribution of differences. 

*For purposes other than this study the distribu- 
tion of responses among the four alternatives to each 
question of the AFROTC group were compared with 
the responses of the 494 CSS students. Chi-square 
analysis of the two groups for each of the 161 items 
indicates that on 129 items the responses were sig- 
nificantly different at the .05 level or better (105 of 
these were significant at the .001 level). It seems 
quite apparent that the human relations attitudes of 
senior AFROTC cadets, prior to the new course, 
were markedly variant from the attitudes of field 
grade officers attending the CSS. 


Table 1 


Percentage Responding “Disagree” on Selected 
Human Relations Items 





Pre- _Post- 

test test CSS ictal aameenginass 

Item vi Y Toward Away 
1 2 : 4 5 6 


Direction of Change 


| 


CMI AMN SF WH | 





31 47 xX 
52 § 82 
85 95 
18 35 
24 43 
42 53 
69 42 
22 § 42 
73 § 91 
63 75 
24 5 
19 q 59 
39 

65 

79 

96 

15 

95 

13 

75 

13 

8&3 

91 

1 

11 

57 

32 

99 

1 


AAA 


AKAMA 


AKA AAA KKK AAA AAAAMAAAAG 


11 
55 


AAAKAA 


43 


| 
| 
| 
| 


obtained: (a) a significant difference between 
the pretest and the criterion, (5) a significant 
difference between the pre- and posttest,® and 
(c) a difference which was not significant be- 
tween the posttest and the criterion. 


5 Since these are correlated proportions, the differ- 
ence needed for significance reduces to 6%. 





156 


The items which met requirements (a) and 
(6) above, but which did not meet (c) 
showed a significant change from the pre- to 
the posttest, although responses on the latter 
still were greater than 10% away from the 
criterion. The 10 items [2, 14, 16, 17, 25, 
26, 27, 34, 38, 39] which were thus identified 
showed changes, presumably as a function of 
the course, but the responses of the cadets 
and of the criterion group still differed. 

A second type of relationship which reflects 
favorably upon the effect of the seminar is 
one in which the items met requirements (a) 
and (c) above, but did not meet (b). The 
two items [3 and 6] of this type revealed a 
difference between the AFROTC group be- 
fore the seminar and the criterion but no 
difference after instruction. 

Sixteen items [15, 18, 19, 20, 21, 22, 23, 
24, 28, 29, 30, 31, 32, 35, 36, 37] met all 
three criteria; the initial significant differ- 
ences between these attitudes of cadets and 
the criterion were reduced to the magnitude 
of chance differences as a result of large 
changes in cadet attitudes from the pre- to the 
posttest. These items constitute the most 


convincing evidence of the effectiveness of the 
course in changing cadet attitudes. On many 


of the items, the response shift from pre- to 
posttest is dramatic in size; for example, on 
one third of the items [21, 28, 32, 35, 36] 
the average change is 80%. Despite the large 
changes the posttest responses of these items 
were, on the average, less than 4% away 
from the criterion. 
Typical of these items are the following: 


[19] The effective supervisor makes himself well 
liked by his subordinates. [21] How a man thinks 
he is being treated is usually more important than 
the treatment he actually receives. [28] A good rea- 
son for not encouraging suggestions is that most of 
the suggestions are poor ones. [32] Men are un- 
happy unless the leader tells them exactly how to 
do their jobs. [35] Emotions distort our perception 
of events. [36] You defeat your own purposes 
when you present both sides of the case to a person 
whom you are trying to influence. [37] If you give 
people the facts they will act in a rational manner. 


Thus, on 26 of the 39 items there was a 
significant change in the cadets’ responses on 
the two tests, while on 18 of the 39 items the 
differences between the cadet and the cri- 


Aaron J. Spector 


terion group responses were reduced to non- 
significance on the posttest. On 11 of the 
items the responses remained virtually un- 
changed. 

Another indication of the effects of the 
seminar (or of some experience during the 
semester) may be shown by comparing the 
absolute differences between the criterion and 
the pretest with the absolute differences be- 
tween the criterion and the posttest. If the 
cadet attitudes changed as a result of the 
seminar in the direction of the criterion 
group’s attitudes, the average absolute per- 
centage difference in the second case should 
be smaller than in the first. In this case, the 
average difference was reduced by half; at 
the beginning of the term it was 24.6% while 
at the end it was 12.0%. 

One other analysis was conducted in which 
the responses of the cadet groups of the 10 
participating schools were examined for each 
of the 161 items. About two thirds of those 
items which showed greater than a 5% dif- 
ference between the pretest response and the 
criterion response changed in the direction of 
the criterion group’s responses. While one 
school had only 53% of the items changed in 
the direction of the criterion, changes of the 
other nine schools ranged between 63% and 
73%. There appeared to be relatively little 
difference between schools, which in itself is 
a matter of importance in administration of 
a training program of this magnitude. 

Although a control group would undoubt- 
edly have improved this evaluation study, the 
weight of the evidence indicating attitudinal 
changes from the beginning to the end of the 
seminar was so great that judgment of the 
effectivenes of the seminar could be made with 
a fair degree of confidence. Aside from its 
value for administrative ends this study pro- 
vides evidence that highly “culture-laden” 
attitudes can be changed by classroom pro- 
cedures. Unfortunately, nothing can be said 
about the relative contribution to attitude 
change of the various pedagogic techniques- 
employed. 

Summary 


A preliminary form of the Attitudes Test in 
Human Relations (ATHURE) was used to 
evaluate the amount of cadet attitude change 








Changes in Human Relations Attitudes 


as a result of a new seminar offered in the 
senior year of the AFROTC program. The 
sample consisted of 10 AFROTC detach- 
ments, one class in each, which were tested 
before and after the seminar. The 1955 
Command and Staff School class were also 
given the ATHURE Form i. _ Additional 
data on these officers were obtained on the 
Officer Behavior Description, an objective in- 
strument which describes the subject officer’s 
human relations behavior “on the job.” Offi- 
cers in the top 27% on the OBD were se- 
lected as the criterion group for the ATHURE. 

The cadets’ pre- and posttest scores and 
the criterion group’s scores were analyzed. 
Of the 39 items on which there was a signifi- 
cant difference in “disagree”? responses be- 
tween the pretest and the criterion group, on 
18 of the items the difference was no longer 
significant on the posttest. On 26 of the 


items the cadets’ responses changed signifi- 


157 


cantly. These changes were fairly consist- 
ent among all 10 detachments. The findings 
indicate that there were sufficient attitude 
changes in the direction of the criterion group 
to warrant confidence in the seminar’s effec- 
tiveness in changing human relations attitudes. 


Received June 10, 1957. 


References 


. Preston, H. O. The development of a procedure 

: for evaluating officers in the United States Air 
Force. Pittsburgh: American Institute for Re- 
search, 1948. 

. Principles of leadership and management. Mont- 
gomery, Ala.: Air Univer., Air Force Reserve 
Officers’ Training Corps, 1954. 

3. Spector, A. J. Human relations behavior on the 
job: The Officer Behavior Description. J. 
appl. Psychol., 1957, 41, 110-113. 

. Spector, A. J. The Attitudes Test in Human Re- 
lations (ATHURE). J. appl. Psychol., 1957, 
41, 209-213. 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Numeral Form as a Variable in Numeral Visibility 


Robert S. Soar 
Vanderbilt University 


It can reasonably be assumed that if the 
experimenter had the option of starting from 
the very beginning to design a set of 10 sym- 
bols to serve as numerals, a much more effi- 
cient set could be designed than the conven- 
tional ones, but the less radical approach of 
altering the presently existing forms has been 
given little attention. Past research has been 
primarily concerned with the dimensions of 
various aspects of numerals, such as height, 
width, and stroke width, as they influenced 
visibility, in the attempt to improve the visi- 
bility of highway signs, license plates, in- 
strument dials, and other such applications. 
Relatively little has been done to increase 
visibility by altering the form or configura- 
tion of the number, however. 

Berger (3) studied this problem by a rather 
mechanical approach involving such proce- 
dures as altering the angle with which the 
vertical of the 7 met the top bar; the angle 
with which the diagonals of the 8 intersect; 
the placement of the central intersection of 
the 3; the length of the top bar of the 5; 
whether the 4 should be open or closed; and 
the relative positions of its vertical and hori- 
zontal. 

Craik (5) reported that the forms for the 
6 and 9 which he presented in one of his dia- 
grams were less confused with each other and 
with the 8 than the numerals which had been 
used on British aircraft instrument dials up 
to that time. No discussion was given to the 
variants studied, or procedures used. 

Bartlett and Mackworth (2) present a set 
of letters and numerals found to be superior 
as identifying symbols to be used on map 
tables in aircraft spotting filter centers. Other 
variables studied in relation to visibility are 
discussed at some length, but nothing is re- 
ported concerning changes in numeral and 
letter form shown." 


1 During the course of this study, J. P. Foley has 
published in this Journar (1956, 40, 178-180) re- 
sults showing an experimentally developed set of 


Brown, Lowery, and Willis (4) compared 
the visibilities of the Berger numbers with 
those of the Army-Navy standard numbers, 
and found several significant differences. On — 
the basis of these obtained differences they 
made further unspecified changes in forms of 
the numerals in designing a set they felt to 
be superior to either of the parent sets. At- 
kinson, Crumley, and Willis (1) have verified 
the superiority of this set of numbers. 

The study most nearly concerned with this 
problem was done by McLaughlin (7). He 
analyzed the forms of the Army-Navy stand- 
ard numerals rationally, and drew a new set 
of forms differing rather radically from the 
standard in such a way as to maximize the 
differences between numerals. He drew a 
second set using the same bolder stroke width 
as his experimental set but with the form of 
the Army-Navy standard, and used as a third 
set the Army-Navy standard. He used an 
unusual means of presentation involving ta- 
chistoscopic projection at very low levels of 
contrast, and found numeral form to be an 
insignificant variable, but stroke width to be 
significant. The stroke width to height ratios 
he used were 1:7.1 and 1:5.3, which are well 
within the optimal range as determined by a 
number of other studies. This contradiction 
of the results agreed on by other studies, to- 
gether with the unusual experimental pro- 
cedure, throws doubt on McLaughlin’s find- 
ings and suggests the need for further in- 
vestigation. 

The need for such study is emphasized by 
the finding that the most visible number is 
typically at least twice as visible as the least 
visible one, however the difference is meas- 
ured; and that the errors are nonrandom (8). 

The problem then is to design a more visi- 
ble set of numerals by altering the currently 
most visible forms, yet without departing suf- 


“angular” digits to be more visible than those of 
Bartlett and Mackworth (2). 


158 








Numeral Form and Numeral Visibility 


Table 1 


Errors in Reading Maximally Legible 
Numeral Forms * 


Subject’s Report 
Stimulus - ; 
Numeral » &-3-6 


— 


CNOeKNWNHN S&S! 


16 


_ 


Nw 
i-— 
' 


* Based on 108 observations per numeral. 


ficiently from the familiar as to require re- 
learning by the observer. 


Procedure 


The first step in planning new numeral designs was 
the assembly of data from an earlier study (8) and 
additional data collected by the writer on confusions 
between numerals. These data are presented in 
Table 1. 

It can be seen that the 0 was the most often con- 
fused with the 6, secondarily with the 9; the 1 with 
the 4; the 2 with the 8; the 5 with the 6, and sec- 
ondarily with the 8; the 6 with the 4; and the 8 
with the 6 and 9. The 3, 4, 7 and 9 appear to be 
confused with other numerals in random fashion. 

These nonrandom patterns of confusion provided 
the basis for designing the experimental numerals. 
An attempt was made to minimize the elements 
common to the confused numerals, and to emphasize 
the unique elements. Two approaches were taken: 
one of altering forms as such, the other of varying 
stroke width within the numeral. Three plates of 
experimental numerals were prepared by these means: 
one by altering stroke width, a second by altering 
form, and a third in which both alterations were 
made simultaneously. A fourth set of experimental 
numerals were prepared which duplicated as closely 
as possible those designed by McLaughlin (7). The 
control numerals were those whose form had been 
found to be maximally visible by Brown et al. (4), 
verified by Atkinson et al. (1), and whose optimal 
height-width combination and stroke width had been 
determined by Soar (8, 9). 

The stimulus numerals were hand-drawn, 2 inches 
high and 1.5 inches wide, and with stroke widths of 
2-4 of an inch. The copies of McLaughlin’s nu- 
merals were drawn to the same external dimensions, 
but with a maximum stroke width of 37 in. This 


159 


was estimated to be scale for the maximum stroke 
width he used, from measurements of enlargements 
of his numerals. These were then reduced to one- 
twentieth of original size photographically, resulting 
in stimulus figures approximately the size of 10-point 
type. These are shown, actual size, in Fig. 1. 

One hundred college student Ss were used, provid- 
ing 20 replications of the five experimental condi- 
tions, with each S randomly assigned to one experi- 
mental condition. All Ss had normal visual acuicy, 
as tested by a near vision eye chart, at the distance 
at which the observations were to be made. 

The apparatus used was the Harvard Tachisto- 
scope, manufactured by Ralph Gerbrands of Arling- 
ton, Massachusetts. 

Each S was first shown several long exposures to 
insure that he understood the procedure. Then a 
practice series of 30 exposures of 10-point Century 
numerals, identical for all Ss, was run. Following 
this, 90 exposures of the experimental numerals were 
presented. A different order of presentation of the 
experimental series was employed for each S, to 
cancel out any possible bias introduced by the order 
of presentation which might influence the various 
numerals differentially. Each order was randomly 
drawn, with the single restriction that each numeral 
should appear an equal number of times. 

The exposures were 40 milliseconds in duration, 
made under approximately one foot candle of illumi- 
nation. 

Analysis of the data was carried out by analysis of 
variance and covariance. The performance of the S 
in the practice period was scored and used in the 
covariance analysis as an index of his ability at the 
perceptual task. Since differences in visibility be- 
tween different forms of the same numeral rather 
than among various numerals were the central con- 


4 
9 
4 
Fq 


5. b 7 


Fic. 1. Experimental numeral forms. 





Robert S. Soar 


Table 2 


Adjusted Means 


Numera 


| 
| 


ConauwFrt WNHK CS! 


3.44 
2.80 
3.27 
5.41 
2.09 
4.35 
5.18 
2.94 
4.77 


6.47 
6.71 
3.90 
6.16 
3.55 
6.72 
6.84 
4.86 
4.67 


5.38 
3.28 
4.04 
5.89 
4.42 


5.35 
4.11 
4.54 


5.68 
5.37 
4.98 


* Mean number of correct readings per S; 9 possible. 


cern, a separate analysis was made for each numeral. 
In addition, the mean visibility of each number of 
each experimental set was adjusted to eliminate sta- 
tistically the effect of differences in perceptual abil- 
ity of the several experimental groups. 

Correlations of means and variances and apparent 
heterogeneity of variances were found in the data, 
so the significant analyses were recomputed after a 
square root transformation of the experimental read- 
ings. 


Results 


The results are presented in Table 2. It 
can be seen that significant differences in visi- 
bility were obtained between the various de- 
signs of the numerals 0, 1, and 2 beyond the 
one tenth of one per cent level, of the 8 at 
the one per cent level, and of the 5 and 6 at 
the five per cent level. 

The 3, 4, and 7 are identical for sets A, C, 
D, and E. Ina sense it is not surprising that 
there should be no significant differences in 
visibility in these. The possibility existed, 
however, that increases in the visibility of 
other numbers confused with these might 
have improved their performance, or that set 
B might have been more visible. 

The data suggest that boldness, rather than 
focm, is the important factor in the increased 
visibility of some of the experimental numer- 
als. Other studies dealing with this factor 
have been reviewed, and this factor restudied 
in relation to another related variable (8, 9), 
but for the set of 10 numerals as a group. 


5.34. 


Adjusted Mean Visibilities* of the Ten Numerals in Five Different Forms 


Significance Tests 
Original Data Transformed Data 


F Ratio P 


F Ratio id 


5.55 
“11.76 
22.20 
54 

A8 
4.28 
3.89 
1.74 
3.92 
1.26 


001 
001 
001 


5.20 
14.64 
12.96 


001 
001 
001 


Whether visibility may be increased by al- 
lowing stroke width to vary from numeral to 
numeral has not been studied, but there are 
suggestions in the study cited above that this 
may be so. The present study reinforces this 
suggestion and adds evidence that stroke 
width changes within various numerals may 
contribute to visibility. Differences within 
each significant analysis were studied by 
Tukey's procedure for comparing individual 
means (6), and the results are reported with 
the discussion below. Differences reported 
are significant at the five per cent level for 
the 5 and 6, and one per cent level for the 
others. 

The differences for the 0 are as follows: AC 
<D< BE. The B figure is bold through- 
out, and the D and E sets are bolder at the 
sides; these three are more visible than the 
A and C sets which are standard in stroke 
width throughout. Why the set D figure 
should be less visible than sets B and E is 
not clear. 

For the 1, sets A and C were less visible 
than B, D, and E. With the exception of the 
rounded ends in set B, the only differences 
from set to set were in boldness, with the 
bolder sets more visible. 

For the 2, the differences are: C< A< 
DE < B. Form appears to be an important 
variable here, as well as boldness. The B set 
has its loop extending throughout its entire 








Numeral Form and Numeral Visibility 


height, and is the most visible. Sets D and 
E like B are also bold, and follow in visibility. 
Sets A and C with the narrow stroke width 
are least visible, and the variation in form 
shown in set C is less visible than the stand- 
ard, form A. 

The Numeral 5 differences were as fol- 
lows: A< C< BD <E. The simultaneous 
changes of form and stroke width in set E 
result in maximum visibility, the stroke width 
changes in B and D in the next best, the 
form change in C is next best, and the stand- 
ard, A, is poorest. 

For the 6, set B was more visible, but the 
others did not differentiate. Both changes in 
form and stroke width were involved. 

Sets B, D, and E were more visible than 
A and C for the 8. Set E involves change in 
both form and stroke width; sets B and D 
differing changes in stroke width only. Set C 
involved change in form only, and set A was 
the standard. Boldness again appears to be 
the important variable. 

Changes in stroke width of the whole figure 
have apparently resulted in greater readabil- 
ity for the 0 and 1, and perhaps for the 2. 
Stroke width changes in part of the numeral 
apparently contribute to accuracy of percep- 
tion for the 5, 6, and 8, although these 
changes are to a degree also changes in form. 
Alteration of form is apparently important in 
the visibility improvement of the 2, 5, and 6. 

When all of the changes are considered, the 
hypothesis is suggested that with changes in 
boldness there may be two important, but 
partially opposed factors making for increased 
visibility—boldness of stroke width, and open- 
ness of white space within the area of the fig- 
ure. A relation between these two factors is 
implied in a statement of Craik’s (5) that the 
optimal stroke width for white numbers on a 
black background is equal to the average 
width of the enclosed area, but the relation- 
ship is not elaborated. 

In this study the 0 is sufficiently open that 
increased stroke width does not encroach 
greatly on the internal white space, only bold- 
ness is important to the 1, and the improved 
2 is a “simpler,”’ more open figure. The three 
most visible 5’s have bolder stroke width but 
without encroachment on open white space. 


161 


The set B 6 maximizes both factors to a 
greater degree than any other set, and is the 
most visible. Sets D and E of the 8 com- 
bine boldness and openness, and are among 
the more visible. Set B, also more visible, is 
bold but not open. It seems reasonable, how- 
ever, that internal detail may not be impor- 
tant in recognition of this figure, but external 
form only. This may be the case for the 0 as 
well, but seems unlikely for the other figures. 

Both factors seem to be maximized in the 
set B 4, and it was the most accurately read 
in this study, although the difference was not 
significant. Comments of Ss suggest. that its 
form was sufficiently similar to the 7 as to 
cause confusion, with a number reporting 
realization of this error during their experi- 
mental observations. This may account for 
failure to find significant differences. 

The status of the 3, 7, and 9 is not clear 
in respect to these proposed factors. 

It should be noted that the apparent con- 
tradiction between the findings of this study 
and of other studies is not real. That is, 
other studies have determined the optimal 
stroke width for numerals of this height- 
width ratio. This study reports increases in 
visibility for several figures as a consequence 
of increased stroke width. The differing re- 
sults are readily reconciled, however, in that 
previous studies have determined an optimum 
for the ten digits as a group, whereas this 
study allowed variation from number to num- 
ber. Further, variation was studied within 
numerals. The results show, then, that some 
numerals require bolder representation than 
others, and that some numbers are improved 
in visibility by emphasizing their unique ele- 
ments with bolder strokes. 


Summary and Conclusions 


Starting from data showing which numerals 
are confused with which among the currently 
most visible set of experimentally developed 
numerals, three new sets were designed in 
which common elements were minimized, and 
unique elements emphasized. An additional 
set developed by another investigator was in- 
cluded, and all were compared with a set of 
the currently most visible numerals. Data 
were collected tachistoscopically, with 20 rep- 





162 Robert S. Soar 


lications of the five experimental conditions. 
Analyses of variance and covariance were per- 
formed on the data, and six of the experi- 
mentally developed numerals were shown to 
be significantly more visible than the current 
standard. — 

A hypothesis generalizing the results is pro- 
posed, which presumes two important but 
partially opposed variables in visibility of nu- 
merals to be boldness of stroke and openness 
of white space within the figure. 


. Berger, C. I. Stroke-width, form and horizontal 
spacing of numerals as determinants of the 
threshold of recognition. J. appl. Psychol. 
1944, 28, 208-231 

. Brown, F. R., Lowery, E. A., & Willis, Marion P. 
A study of the requirements for letters, num- 
bers and markings to be used on trans-illumi- 
nated aircraft control panels. Part 1, The 
effect of stroke-width upon the legibility of 
capital letters. Report TED No. NAM EL- 
609. Philadelphia: Naval Air Experimental 
Station, 1949. 

. Craik, K. J. W. Instrument lighting for night 
use. Air Ministry Flying Personnel Res. Com- 
mittee Rep. FPEC 342. London: 1941. 

. Edwards, A. L. Statistical methods for the be- 
havioral sciences. New York: Rinehart, 1954. 

. McLaughlin, S. C. Configuration and stroke- 


Received June 11, 1957. 


References 
1. Atkinson, W. R., Crumley, L. M, & Willis, 


Marion P. A study of the requirements for 
letters, numbers, and markings to be used on 
trans-illuminated aircraft control panels. Part 
5, The comparative legibility of three fonts 
for numerals. Report TED No. NAM EL- 
609. Philadelphia: Naval Air Experimental 
Station, 1952. 


width in numeral legibility. Unpublished mas- 
ter’s thesis, Tufts Coll. Library, November, 
1948. 


. Soar, R. S. Height-width proportion and stroke- 


width in numeral visibility. Unpublished doc- 
toral dissertation, Univer. of Minnesota, Au- 
gust, 1952. 


2. Bartlett, F.. & Mackworth, N. H. Planned see- 
ing. Air Ministry Air Publication 3139B. 
London: His Majesty’s Stationery Office, 1950. 


. Soar, R. S. Height-width proportion and stroke 
width in numeral visibility. J. appl. Psychol., 
1955, 39, 43-46. 





Journal sy Psychology 
Vol. 42, No. 3, 1958 


Readability of Braille as a Function of Three Spacing 
Variables ° 


Ernest Meyers, Doris Ethington 


University of Kentucky 


and Samuel Ashcroft 


American Printing House for the Blind, Louisville, Kentucky 


The purpose of this investigation was to 
determine, within limits, those spacing values 
of Braille print which permit greatest read- 
ability by blind children. The spacing values 
studied were the distance between dots within 
the six point Braille cell, the distance between 
Braille cells, and the distance between Braille 
lines. 

A search of the literature revealed that since 
Barbier and Braille devised dot systems for 
blind readers there has been considerable ef- 
fort to improve the Braille code, but little 
work has been concerned with the spacing of 
the units which make up the code. The Com- 
mission on Uniform Type for The Blind (4) 
did conduct a study of Braille readability 
as a function of four combinations of the 
above variables. The present specifications 
are based on the conclusions of this investi- 
gation (2). However there has been no ex- 
perimentation involving systematic variation 
of spacing variables. 


Method 


Subjects. 275 blind children in Grades 5-12 from 
state schools for the blind in Kentucky, Tennessee, 
Indiana, Illinois, and Ohio were given an initial read- 
ing test. Twenty-two Ss whose comprehension of 
the material was less than 50% were discarded. One 
hundred and eight of the remaining 253 Ss were se- 
lected to read the experimental material. The scores 
were divided into quartiles and 27 groups of four 
readers each were devised. Each group contained 
one reader from each quartile and the mean reading 
score of each group was approximately the same. 

Material and design. The present spacing specifi- 
cations of Braille are: .090” between dots within 
cells, .160 between Braille cells, .220” between 
Braille lines. This experiment is concerned with 

1 This work was supported by a contract between 
the American Printing House for the Blind and the 
Kentucky Research Foundation, University of Ken- 
tucky, and represents part of the research designated 
in a contract between the American Printing House 
for the Blind and the Library of Congress. 


readability as these values are varied. Three values 
of each variable were selected. They were: between 
dots, .080’, 090”, and .100"; between Braille cells, 
123", .140”, and .160"; between Braille lines, .163”, 
220", and 300". 

All of the reading material was from The Black 
Arrow (5), edited for easy reading. A Flesch (3) 
readability analysis was made on this novel and it 
proved to be readable by fifth-grade pupils. In or- 
der to obtain initial measures of reading ability, all 
Ss read from Chapters 12 and 13 printed with stand- 
ard specifications. 

The experimental material was Chapters one 
through eight of The Black Arrow. There are 27 
possible combinations of the above spacing values 
and each group of four was randomly assigned one 
of these combinations to read under constant condi- 
tions. Comprehension was controlled and the de- 
pendent variable was reading speed in words per 
minute. 

Meticulous care was taken to print the Braille as 
uniformly as possible. Dot height was held approxi- 
mately constant at .015"° and the base diameter of 
the dots was .055”’. 

Procedure. Groups ranging from 5 to 19 Ss read 
the equating material for 30 minutes. Included in 
the instructions were directions to read as rapidly as 
possible without skipping. The Ss were told that 
they would be given a comprehension test when they 
finished reading. This consisted of 20 multiple-choice 
items. In computing comprehension scores, only 
those questions were counted which referred to ma- 
terial actually read. 

Each S read his assigned experimental material for 
two 50-minute periods on successive days. No more 
than four Ss read simultaneously. The instructions 
were similar to those of the equating part of the 
study, and a 30-item comprehension test was ad- 
ministered at the end of the second reading period. 
The E recorded the time that it took each S to 
finish reading each page. Twelve of the original 108 
Ss selected to read experimental Braille were re- 
placed by Ss with similar initial reading scores. They 
were replaced because of lack of comprehension of 
the experimental material, failure to follow instruc- 
tions, or unavailability on the testing dates. 


Results 


The average reading speed of the 275 Ss on 
the equating material was 68 words per min- 


163 





Ernest Meyers, Doris Ethington, and Samuel Ashcroft 


Table 1 


Mean Reading Rates in Words per Minute as a 
Function of Dot, Cell, and Line Spacing 


Spacing Rate 


Spacing Rate Spacing Rate 





080” 
090" = 73.3 
100” 71.8 


65.8 123” 
140” 


160” 


72.8 
69.6 
68.6 


163” 67.8 
220" 74.7 
300” §=68.8 


ute. The lowest score was five words per 
minute, the highest was 195 words per minute. 
The average reading speed of each of the 27 
groups of four Ss was made to equal 70 words 
per minute. Because of necessary last minute 
substitutes, five groups were changed so that 
their mean reading speed was not exactly 70 
w.p.m. These five groups ranged from 68 to 
77 w.p.m. 

Since only four Ss read any one of the ex- 
perimental combinations, the most appropri- 
ate way to arrange the data was to tabulate 
the reading speeds of all Ss who read Braille 
with a given spacing value of a given variable 
regardless of other values of other variables 
with which it was combined. There were 
nine such groups of 36 Ss, each group having 
read Braille containing one of the nine desig- 
nated spacing values (Table 1). 

Curves were drawn showing the mean read- 
ing speed for each successive 25 minutes 
(Figs. 1, 2, 3). Each curve represents 36 
Ss who read Braille which contained a given 





¢0 


7? 


10 








4 i 
2 Ky 





+ 


READING PERIODS 


Fic. 1. Average number of words per minute for 
successive reading periods with three values of spac- 
ing between dots. 





WORDS PER MINUTE 


i 








READING PERIODS 


Fic. 2. Average number of words per minute for 
successive reading periods with three values of spac- 
ing between cells. 


value. From inspection of Fig. 1, it can be 
observed that Braille whose dot spacing is 
.080” is read consistently more slowly than 
Braille whose dot spacing is .090” and .100”. 
There seems to be little difference between the 
latter two. It can also be observed that there 
is no rise in the .080” curve, but there is im- 
provement by the readers of Braille whose 
dot spacing was .090” and .100”. Braille 
whose cell spacing is .123” is read faster than 
Braille whose cell spacing is .140” or .160” 
(Fig. 2). The .123” curve is consistently 
above the other two. The curve for .220” 
line spacing (Fig. 3) remains above the other 
two values at all points. 

Alexander’s test for trend (1) was the prin- 
ciple statistical technique employed. Among 
the sources of variance are differences between 
group slopes and differences between group 
means. Separate trend analyses were made 
for the three spacing variables and these were 





WORDS PER MINUTE 








l i‘. 
x $ 4 
READING PERIODS 


Fic. 3. Average number of words per minute for 
successive reading periods with three values of spac- 
ing between lines. 





Readability of Braille as a Function of Three Spacing Variables 


followed with two factor analyses of variance.” 
The trend analysis for dot spacing indicates 
significant differences between group slopes 
(p < .01) and the group means (p < .01). 
The analyses of variance between .080” and 
100” (p < .01) and .080” and .090” (p< 
.01) both indicate that Braille whose dot 
spacing is .080” is least readable. There was 
no difference between .090” and .100”. The 
trend analyses for cell spacing reveals signifi- 
cant differences between group means (p < 
.01) and group slopes (p < .01). The analy- 
ses of variance demonstrate that the .123” 
cell spaced Braille is more readable than 
.160” (p< .05) but the difference between 
.123” Braille and .140” Braille did not reach 
the .05 level. The trend analyses for line 
spacing also revealed extra-chance differences 
for group slopes (p < .01) and group means 
(p < .01), and the subsequent analyses of 
variance demonstrate greater dealability for a 
line spacing of .220” as opposed to either 
163” (p < 01) or .300” (p < .01). 


Discussion 


_ Within the limits of the values used, dot 
spacings .090” or .100”, cell spacings of .123” 
or possibly .140”, and line spacing of .220” 
appear to be the most readable specifications 
of Braille for blind school children. .090” 
dot spacing and .220” line spacing are the 
values ordinarily used by the printing houses. 
However, a cell spacing of .123” is consider- 
ably closer than the .160” presently employed. 
Considering the small amount of previous re- 
search on the problem, it is somewhat sur- 
prising that the spacing combination which 
this study indicates is the most readable is so 
similar to present specifications. It is pos- 
sible that two 50-minute reading periods are 
not long enough to overcome ‘the tendency to 
perform well with Braille on which Ss are al- 
ready practiced. There are two arguments 
against this interpretation. The first is de- 
rived from inspection of the curves. If the 
results are due to the novelty of new spacing 
values, the latter should show greater im- 
provement than the standard values. This 
does not occur, with the exception of the 


2 The statistical tables are available at the Ameri- 
can Printing House for the Blind. 


165 


.100” curve for dot spacing which is posi- 
tively accelerated and finally crosses the .090” 
curve. Secondly, Braille whose cell spacing is 
either .123” or .140” is more readable than 
material whose cells are spaced .160” apart. 
This should not have been the case if previ- 
ous experience were a major factor. Never- 
theless an experiment should be done in which 
Ss read over a considerably longer period of 
time, perhaps with just a few of the 27 com- 
binations employed in the present investiga- 
tion. It should also be desirable to investi- 
gate values which were not employed in the 
present investigation. 


Summary 


Three values of spacing between dots within 
Braille cells, three values of spacing between 
Braille cells, and three values of spacing be- 
tween Braille lines were read in all possible 
combinations in an effort to determine the 
most readable specifications of Braille print 
for school children. Each of the 27 com- 
binations was read for 100 minutes by sepa- 
rate groups of four Ss. The material was 
Chapters one through eight of The Black 
Arrow. The groups were equated on the ba- 
sis of an initial reading with standard Braille. 
The analysis indicated that a dot spacing of 
.080” is inferior to .090” or .100”, there being 
no significant difference between the latter 
two. A cell spacing of .123” or possibly .140” 
is more readable than cells spaced .160” apart. 
Braille whose line spacing is .220” is more 
readable than either of the other two values 
used. Comprehension was controlled and 
words per minute was the dependent variable. 


Received June 24, 1957. 


References 


. Alexander, H. W. A general test for trend. Psy- 
chol. Bull., 1946, 43, 553-557. 

. Best, H. Blindness and the blind. New York: 
Macmillan, 1934. 

. Flesch, R. F. How to test readability. 
York: Harper, 1951. 

. Report of the Commission on Uniform Type for 
the Blind, American Association of Instruc- 
tors for the Blind, 1920 convention. 

. Stevenson, R. L. The black arrow. Adapted by 
J. Corlin and H. I. Christ. New York: Globe, 
1947. 


New 





Journal of Applied Psycholo, 
Vol. 42, No. 3, 1958 ai 


Intra-Individual Differences in Sensory Channel Preference ' 


Brian R. Kay 
University of New Hampshire 


This paper is an attempt to underline the 
importance of study of sensory differences 
within, as well as among, individuals. It is 
our contention that almost all workers con- 
centrating on perfecting techniques of pre- 
senting information to mass audiences through 
the two major sensory channels have failed to 
ask sufficient questions of their data. Con- 
sistent examination of data at only one level 
of analysis has resulted in contradictory con- 
clusions and as a consequence general prin- 
ciples have not been formulated. 

Operating within the framework of the de- 
gree of overlap in auditory and visual group 
performances, the typical question posed has 
been: “Do people learn, remember, or com- 
prehend more effectively when the material is 
presented to the eyes or to the ears?” Re- 
searchers have not asked “Are there any indi- 
viduals who clearly favor one sensory mo- 
dality over another in learning and remem- 
bering?” The concentration on this first 
question has not led to unequivocal answers. 

Earlier writers such as Fechner, Galton, 
Binet, and James were clearly impressed by 
the magnitude of the differences among indi- 
viduals with respect to their abilities to im- 
agine materials visually and auditorily. 

However, the writings of these men have 
been largely ignored (1, 2, 3, 4,5). The re- 
viewers without exception have drawn atten- 
tion to the contradictory nature of available 
evidence. The contradictions are attributed 
to the design of the experiments, to the size 
of the samples, to the nature of the materials 
or to the vagaries of chance. While we do 
not doubt that these factors are responsible 
for some of the different findings, it is our po- 
sition that what has been overlooked is the 
distorting presence of a few individuals in 
any one sample who are capable of function- 
ing significantly more effectively when the 
material is presented to one modality than 

1 Acknowledgment is made to the Central Univer- 


sity Research Fund of the University of New Hamp- 
shire for a grant in support of this study. 


166 


when it is presented to another. Their pres- 
ence, we contend, has been masked by the 
concern with interindividual differences and 
with the statistics written for assessing signifi- 
cance of differences between group measures. 

The lack of recognition of these deviant in- 
dividuals in the literature could mean that 
their presence is so well-known as to be taken 
for granted, but we are not conyinced that is 
the case. Our problem, then, was to ascer- 
tain if such people exist, and, if so, in what 
types of mental performance they manifest 
their preference. 


Method 


The first phase of our research was devoted to a 
mass screening of university students with the pur- 
pose of identifying any deviant individuals that 
might be found. A task common to many studies has 
been that of immediate memory, and it was selected 
as our screening task. Thirty-eight words, all nouns 
and readily available in any high school student’s 
vocabulary, were arranged in pairs, an attempt being 
made to pair those that had minimal associative 
value. Examples were Banana—Snow, Boot—Milk, 
Paint—Cigarette. The pairs then were randomly as- 
signed to either the auditory or visual lists. The 
visual presentation was made through a 16 mm. pro- 
jector, and the auditory through a tape recorder. 
The instructions for the visual mode were as fol- 
lows: “This is a test of your ability to remember. 
You will be shown on the screen a list of words— 
two at a time. Watch carefully, for when the com- 
plete list has been shown once, you will be tested for 
your memory of the pair, given one word as a cue. 
Here are two examples of the type you will be 
shown. 

Cup—Pipe 
Book—Tree 


When the list of pairs has been shown once, you will 
be given the words that appear on the left and your 
task will be to associate those words with the ones 
with which they were previously paired. 

Are there any questions? 

Get ready.” 

The auditory instructions were identical except 
that the appropriate words to describe that sensory 
channel were substituted. 

Six groups of students participated in the screen- 
ing. They were drawn from introductory courses in 
Psychology and Human Relations. Three groups 





Sensory Channel Preference 


Table 1 


Group Performance on Auditory and Visual Modes 


N 
Auditory retention 262 
Visual retention 262 
Difference— 
Auditory-visual 


* Significant at the .1% level. 


were first presented with the auditory list and three 
the visual, the balanced design being used, of course, 
to control for transfer and/or inhibitory effects of 
order. Though it would have been tidier to have 
presented each list of words to both eyes and ears 
since this was a screening test and intra-individual 
differences were our primary concern, the expense 
did not seem justified. 


Results 


Table 1 presents the results for group per- 
formance. The superiority of the visual chan- 
nel is in agreement with many other studies 
(2). 

The difference was found to be significant, 
but does that mean that it would be better to 
present such material to the eyes than to the 
ears? If a decision had to be made through 
which channel to present our information, the 
visual would be better for more people than 
an auditory presentation. However, in mak- 
ing such a decision, one would ignore the 
fact that any auditorily dominant individuals 
would be penalized. 

Clearly the next question was to see if 
the differences in individual performance were 
worth worrying about. Still expressing intra- 
individual differences as a ratio we found that 
31 people, or 12%, favored visual over audi- 
tory by a factor of three or more and ranging 
as high as 9:1, while 11 individuals or 4% 
favored auditory over visual by a factor of 
three or more, again ranging as high as 9:1. 
These extreme people were found, regardless 
of the order of presentation. It is clear then 
the auditorily dominant individuals would not 
be receiving the materials through the mo- 
dality in which they were most efficient if the 
materials were presented visually. It is also 


167 


clear that if we decided in favor of a visual 
presentation, since 45% were equal in per- 
formance on both modes and 16% favored 
auditory, we would be essentially deciding in 
favor of visual even though three-fifths of the 
cases did not favor visual. 


Discussion 


These atypical people seem to be relatively 
few in number, and normally the discrepancy 
in their performance is likely to go unnoticed. 
In most real life situations they presumably 
can adapt themselves to the circumstances of 
the moment. If the problem is one of educat- 
ing or persuading the great majority of peo- 
ple, the detection and removal of these grossly 
atypical individuals from the sample should 
then lead to conclusive evidence, where su- 
periority in one mode of presentation exists, 
for the type of material studied. Where the 
problem, however, is concerned with influenc- 
ing each individual member of the group, the 
presence of a few extreme cases will necessi- 
tate either a consistently dual mode of pres- 
entation, or a division of the group according 
to its favored modality. Finally, when the 
problem is one of decision about an_ indi- 
vidual, as is the case in personnel selection 
and psychodiagnostics, appropriate procedures 
will presumably follow the recognition of such 
deviants. 


Received July 8, 1957. 


References 


1. Dale, E., Finn, J. D., & Hoban, C. F. Research 
on audio-visual materials. In N. B. Henry 
(Ed.), Yearb. nat. Soc. Stud. Educ., 1949, 48 

. Day, W. F., & Beach, B. R. A survey of the re- 
search literature comparing the visual and 
auditory presentation of information. Dayton, 
Ohio: Wright Patterson Air Force Base, 1950. 
AF Tech. Rep. No. 5921. 

3. Elliott, F. F. Memory for visual, auditory and 
visual-auditory material. Arch. Psychol., No. 
199. 

. Stroud, J. B. Educational psychology. 
York: Macmillan, 1935. 

5. Woodworth, R. S., & Poffenberger, A. T. Text- 
book of experimental psychology. Unpub- 
lished manuscript. Columbia Univer., 1920. 


New 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


A Selection Set Preference Index’ 


Robert E. Krug 


Carnegie Institute of Technology 


In a previous article (1), it was shown that 
scores earned on the Ghiselli Self-Description 
Inventory (GSDI), a forced-choice adjective 
scale, increased under a variety of “assumed 
selection” sets. It was suggested that a gen- 
eral factor was responsible for this bias, and 
that control might be possible by equating 
items on a preference index obtained under a 
selection set. The present study investigates 
the efficacy of an index of this type. 


Procedure 


The adjectives of the GSDI were presented to Ss 
in two lists. One list contained the 64 generally fa- 
vorable terms, the other contained the 64 unfavor- 
able words. Each list was accompanied by a sheet 
containing the following instructions: “For each word 
in the accompanying list, decide how favorable this 
term would be as a description of a job applicant. 
Indicate your judgment on the answer sheet accord- 
ing to the following scale: 


1. An extremely favorable comment; it is difficult 
to imagine anything better being said about a 
job applicant. 

. A favorable comment; an employer would like 
to hire people described in this way. 

. A somewhat favorable comment; while not 
strongly positive, it is certainly not negative. 

. A neutral comment; neither favorable nor un- 
favorable. 

. A somewhat unfavorable comment; while not 
strongly negative, it is clearly not positive. 

. An unfavorable comment; an employer would 
not want to hire people described in this fashion. 

. An extremely unfavorable comment; it is diffi- 
cult to imagine anything more damaging being 
said about a job applicant.” 


Judgments were made by 50 senior men in a col- 
lege of engineering. None of the Ss was familiar 
with the GSDI. 


Analysis of Results 


The Numbers 1 through 7 were assigned to 
the seven scale areas as indicated above. A 
mean and standard deviation was computed 
for each item. The mean value of the 50 


1 The author wishes to thank Doris Northrup and 
Sylvia Sebulsky for clerical and computational as- 
sistance. 


168 


judgments for an item is the selection set 
preference index for that item. For purposes 
of reliability estimation, odd- and even-num- 
bered answer sheets were tabulated sepa- 
rately. The adjusted reliability coefficients 
for the list of favorable adjectives, the list of 
unfavorable adjectives, and for the total of 
128 adjectives were .94, .97, and .99, respec- 
tively. These were computed by correlating 
the item PI based on judgments of 25 odd- 
numbered raters with the PI obtained from 
the 25 even-numbered raters, and then ad- 
justing via the Brown-Spearman formula. It 
would appear that the number of Ss used was 
sufficient to obtain reasonably stable prefer- 
ence indices. 

For each item pair in the GSDI a PI 
discrepancy was obtained by the formula, 
D; = PI, — PIj,, where i; is the first mem- 
ber of the i™ pair and ig is the second mem- 
ber. Since the scoring system results in a 
low PI for a favorable item, a discrepancy 
with positive sign indicates that i, is a more 
favorable description than ig, while a discrep- 
ancy with negative sign indicates the reverse. 
The D measure, then, is a prediction of the 
magnitude and direction of selection bias ex- 
pected of a given pair. To conserve space, 
the list of means and standard deviations for 
the 128 adjectives is not presented.* Since 
the D measure is based upon these item 
means, a comment concerning the significance 
of the mean differences may be in order. An 
analysis of variance design was applied to the 
judgments; using the error term from this 
analysis, a difference as large as .31 is sig- 
nificant at the .05 level. This would indicate 
that 38 D’s are significant. The difficulty 
with this approach is that if we are interested 
in equating items so as to control some source 
of variation, members of a pair are assumed 
to be different unless proved equivalent. This 
is opposite to the usual statement of the null 
hypothesis and would suggest that we accept 


2 A copy of this list is available from the author. 





A Selection Set Preference Index 


Table 1 
Distribution of PI Discrepancies for the 
64 Pairs of the GSDI 


D 


+ 1.26 to 
+ .76to 
+ .26 to 
+ .25 to 
— .26to 
- .76to 
—1.26 to — 


1tir+ cst 
— — 
~sNNN ~~ 
wuauwue ow 


a very low level, perhaps .10 or .20 as indi- 
cating a significant difference. For this rea- 
son, all D’s are employed in the correlational 
analysis which follows. It is also true, of 
course, that an observed D of 0.00 is as use- 
ful a predictive index as any other. Table 1 
shows the distribution of D for the 64 pairs of 
items in the GSDI. The corrected split-half 
reliability of D is .951. 

In order to test the discrepancy as a pre- 
dictor of bias and, by inference, test the se- 
lection set PI as a controller of bias, data 
available from the previous study (1) were 
employed. In that study, an independent 
sample of 46 Ss completed the GSDI under 
an “accurate self-description” set and under 
three “assumed selection” sets, where the 
relevant objective was initiative, intelligence, 
or self-assurance. Each of these sets led to 
score increases on the relevant key, but, in 
addition, the bias introduced by the initiative 
and intelligence sets generalized to all three 
relevant keys as well as to an irrelevant key. 
This pattern of increases was interpreted as 
due to a general factor which might be termed 
selection bias. It was suggested that the self- 
assurance key contained less of this factor 
than did the other two keys. If the selection 
set PI is a measure of this factor, the PI dis- 
crepancy (D) of the present study should 
predict the responses made on the assumed 
selection sets of the previous study. Since 
the GSDI requires the S to select the most 
descriptive term in the first 32 pairs (favor- 
able terms) and the least descriptive term in 
the last 32 pairs (unfavorable terms), it is 
evident that the relevant correlation is be- 
tween the D measure as one variable, and the 


Table 2 
Correlations Between PI Discrepancy and Frequency 
Choosing More Favorable Term Under 
Several Sets 


Self 
Descrip- 
tion 


Intelli 
gence 


Initia- 
tive 


32 favorable 

pairs 525 877 
32 unfavorable 

pairs 442 
All pairs 


763 


01 
.668 


frequency choosing i; for pairs 1—32 
for pairs 33-64 as the second variable. 
correlations are presented in Table 2. 

It will be noted that the selection set PI 
predicts frequency of choice, not only on the 
assumed selection sets, but on the “accurate 
self-description” set as well. It would ap- 
pear that the original pairing of items was 
inadequate, in the sense that members of a 
pair are not equally preferred as self-descrip- 
tive terms. However, the correlations with 
the initiative and intelligence sets are uni- 
formly higher than either the self-assurance 
or accurate self-description sets. The signifi- 
cance of these differences was tested using the 
formula suggested by McNemar (2, p. 148) 
for correlations involving one common vari- 
able. Table 3 presents the ¢ values associated 
with the difference between the self-descrip- 
tion set and the two-selection sets. We may 
summarize this table by saying that the PI 
discrepancy is related to selection set re- 


and is 
These 


Table 3 


Values of / and p for Differences Between 
Correlation Coefficients 


Between Self-Description Set and: 
Initiative Set Intelligence Set 
p t p 


2.16 O05 
1.74 .10 
2.92 O1 


Favorable pairs 
Unfavorable pairs 
All pairs 








170 


sponses to a significantly greater degree than 
it is to accurate self-description responses. 
Table 2 also shows a consistently higher 
correlation for favorable pairs than for gen- 
erally unfavorable pairs. Using Fisher’s r to 


z transformation, the difference .877 — .643 is 
significant at the .05 level; the other differ- 
ences are not statistically significant. 


Discussion 


If the members of a forced-choice pair were 
identical in general desirability, the discrep- 
ancy between those members on a selection 
set preference index should represent the po- 
tential bias unique to the selection situation. 
The correlation coefficients observed in this 
study suggest that the members of GSDI pairs 
are not equivalent on this general desirability. 
Consequently, the discrepancy measure em- 
ployed confounds general desirability with 
selection situation desirability. While the 
relative contribution of these two sources of 
bias might be estimated from the available 
data, such apportionment does not appear 
crucial to our argument. What appears im- 
portant to the development of forced-choice 
instruments is the fact that both of these 
sources are real, and must be controlled. The 
data suggest that a favorability index which 
considers the situation in which the scale is 
to be used is a step in the direction of ade- 
quate control. In this respect, it should be 
noted that since our Ss were all engineering 
students, the selection set preference index 
may contain variance specific to technical 
jobs. We do not know how specific we must 
be in our definition of the situation. If the 
preference index changes with the group, a 
series of preference indices would be necessary 
to control selection bias. Obviously, it would 
be desirable to possess one scale which was 
usable in a variety of selection situations, but 
this may not be possible. It should also be 
noted that we have ignored the issue of the 
discrimination index. While it is possible to 
equate words on a variety of bases, there is 


Robert E. Krug 


no guarantee that such words will have dif- 
ferential value for some criterion performance. 
This issue is clearly susceptible to empirical 
resolution. 

One incidental finding deserves some men- 
tion. Forced-choice scales are reputed to be 
unpopular with Ss, in large part because of 
the necessity to select from a group of unfa- 
vorable alternatives. It would be desirable 
to eliminate negative terms if this could be 
done without loss of validity. While our data 
are by no means conclusive, it must be noted 
that (a) judgments of negative terms are as 
reliable as judgments of positive terms, and 
(b) responses to negative pairs appear less in- 
fluenced by the relative favorability of the 
members of the pair. It is at least conceiv- 
able that an S is more highly motivated to 
make an accurate description in the case of 
negative pairs, since it may appear that the 
risk involved is greater. 


Summary and Conclusions 


1. A selection set preference index was de- 
veloped for the adjectives of the Ghiselli Self- 
Description Inventory. This index was used 
to compute a PI discrepancy for each pair of 
the inventory. 

2. The PI discrepancy was shown to be re- 
liable and significantly related to responses 
made under several conditions in a previous 
study. 

3. It was concluded that the selection set 
PI contained both general desirability and se- 
lection situation desirability components; con- 
trol of both appears necessary for use in se- 
lection. 


Received July 18, 1957. 


References 


1. Krug, R. E. The effect of specific selection sets 
on a forced-choice self description inventory. 
J. appl. Psychol., 1958, 42, 89-92. 

2. McNemar,Q. Psychological statistics. 
Wiley, 1955. 


New York: 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Preference for Foods in Relation to Cost ' 


Purnell H. Benson’ and David R. Peryam 


Quartermaster Food & Container Institute, Chicago 


The recognized ultimate goal in studying 
the food preferences of military personnel is 
to be able to plan meals which will give opti- 
mum satisfaction and thereby contribute to 
morale and efficiency. Preference, however, 
cannot be considered alone, since there are 
limitations upon the freedom with which foods 
may be selected. Fulfillment of nutritional 
needs is a primary requirement in menu plan- 
ning. Cost is another important one. The 
amount of money available to the Services for 
food is adequate only if constant vigilance is 
exercised in the planning of menus, purchase 
of foods, and preparation of meals. 

This paper will consider only the preference 
versus cost aspect. Essentially the problem 
is one of optimization—what foods should be 
included on the menu in order to provide 
maximum consumer satisfaction for the money 
spent. Assuming that preferences as verbally 
reported by soldiers are true indicators of 
what they want to eat, the problem may be 
restated as one of providing the “most pref- 
erence” per dollar. Since this first study was 
intended as exploratory it was limited to only 
one type of food. Meat dishes were selected 
for investigation because of their central im- 
portance in most meals, and the relationship 
of preference to cost was examined. 


Procedure 


Cost data for dishes served on the standard Army 
installation ration are not as readily obtained as 
might be expected. The initial cost of the ingredi- 
ents is not the only thing involved, since greater or 
lesser amounts of time may be required in prepara- 
tion; however, time studies of labor costs were not 


1This paper reports research undertaken at the 
Quartermaster Food and Container Institute for the 
Armed Forces, and has been assigned Number 725 in 
the series of papers approved for publication. The 
views or conclusions contained in this report are 
those of the authors. They are not to be construed 
as necessarily reflecting the views or indorsement of 
the Department of Defense. 

2 Present address: Drew University, Madison, New 
Jersey. 


undertaken. The analysis reported here was based 
upon the wholesale cost of the meat constituent of 
each dish, excluding the cost of labor or of added 
ingredients. However, the latter are usually much 
less than the costs of the meat. 

Cost data were obtained by averaging the Chicago 
wholesale prices for April and August, 1956, as listed 
in The National Provisioner. The actual listed prices 
for chicken, turkey, pork, and ham could be directly 
used; however, the costs for beef and veal had to be 
derived by an indirect method. The Services pur 
chase carcass beef and veal, which is then divided 
into the various classes appropriate for different 
types of preparation. Beef is given a six-way classi- 
fication into dry heat steaks, moist heat steaks, dry 
heat roasts, moist heat roasts, diced beef, and ground 
beef. Veal is classified into ground veal and solid 
cuts to be used for cutlets, steaks and roasts. The 
total carcass price was apportioned among the dif- 
ferent classes of either beef or veal according to the 
relative wholesale market prices at which the meat 
would have been obtained if the cuts had been pur- 
chased separately. Prices of typical cuts in each of 
the classes were averaged for this purpose. 

Preference ratings for the meat dishes were ob- 
tained in a series of Army-wide food attitude sur- 
veys conducted by the Quartermaster Corps during 
the years 1950-54 (6). The values used were mean 
ratings on the hedonic scale, a 9-category scale whose 
points are described as successive degrees of liking 
and dislike (4), obtained from random samples of 
Army enlisted men. The sample size varied from 
1,500 to 4,000. Evidence from repeated surveying 
has shown that these mean preference ratings change 
very little from year to year. 

A curve of the form, Y¥ ~a+6X+cVX, was 
fitted to a plot of the preference ratings and cost 
estimates, Y indicating the preference variable and 
X the cost variable. This form for the preference 
function was previously developed in a pilot study 
of the relationship of food preference and cost 
conducted among students at a college dining hall 


(1). 
Results and Discussion 


Table 1 gives the mean preference ratings 
and costs for the 17 meat dishes included in 
the study. The cost figure is the actual cost 
of the meat in one standard serving. In Fig. 1 
preference has been plotted against cost and 
the curve drawn in for the equation appli- 
cable within the range of observations. Con- 








Purnell H. Benson and David R. Peryam 


Table 1 


Mean Preference Ratings and Cost per Serving for 
17 Meat Items Served in Army Rations 





Mean Cost 
Preference per 
Ratings Serving 


Food Item 





Chili con carne 6.6 O88 
Beef stew 6.7 O88 
Vealburger 6.7 112 
Swedish meat balls 7.0 112 
Meat loaf 7.2 112 
Meat balls with spaghetti 74 112 
Hamburger steak, baked 7.5 112 
Pot roast of beef 7.9 .150 
Chicken salad 7.3 .166 
10 Baked ham 7.7 .216 
11 Breaded pork chop 7.5 .220 
12 Grilled pork chops 7.8 .220 
13 Fried chicken 8.2 

14 Breaded veal cutlets 7.6 .240 
15 Roast beef 8.0 .240 
16 Roast turkey 8.1 .270 
17 Grilled steak 8.3 360 


ocConauwnr wn 


o 


stants were obtained for the function by the 
method of least squares, which gave the fol- 
lowing regression ‘equation: 


Y = 3.65 — 101 X + 1.37V/X. 


PREFERENCE 
MEAN HEDONIC RATING 





1 


The curve has several points of interest. 
As might be expected, preference increases as 
the cost of the meat constituent increases. 
The relationship is fairly close, the coefficient 
of curvilinear correlations being + .86. The 
curvilinearity is statistically significant at the 
.95 level of confidence. The slope of the 
curve diminishes with increasing cost. This 
indication that equal increments in cost do 
not produce equal gains in consumer satis- 
faction may be generally true. For example, 
most people of average income would agree 
that the increased satisfaction of eating squab 
rather than chicken would not be great enough 
to warrant spending the family budget on the 
former, whereas the increased satisfaction of 
eating steak rather than stew very often 
merits paying the higher price. These data 
suggest that the same principle, that of di- 
minishing returns, applies in military feeding 
even though here the consumer is not im- 
mediately concerned with costs. It is likely 
that this curvature effect is also due, in part, 
to inequality of the intervals of the preference 
scale. As the level of average preference in- 
creases and approaches the end of the scale, 
which would correspond to a rating of 9.0, 
the respondent’s opportunity for indicating a 
high degree of liking is progressively more re- 





1 L 1 “i. 
20 2 28 32 


dine 
6 


COST —CENTS PER SERVING 


Fic. 1. 


Relation of preference to cost for meat dishes served in the Army garrison ration. 


(Dishes numbered as in Table 1.) 





Preference for Foods in Relation to Cost 


stricted. Thus, a change in mean rating from 
7.0 to 7.5 may represent less increase in actual 
preference than a change from 8.0 to 8.5. 
Previous research on the hedonic scale method 
has supported this explanation (2, 3). 
Interpretation of these results will be facili- 
tated by assuming, for the present, that over 
the years the Quartermaster Corps has spent 
money for various types and grades of meat 
and other foods, at various prices, in ways 
which have approximately maximized the sol- 
diers’ satisfaction with the ration within the 
limitations imposed by a restricted budget. 
For example, we might assume that they 
would be about as well satisfied with meat 
balls, which take only 11¢ of the daily food 
allowance, as with roast turkey, which takes 
27¢ of the same allowance, if they realized 
that the higher priced meat would substan- 
tially curtail the outlay for vegetables and 
desserts. Then the curve may be considered 
as indicating how much preference must be 
augmented by the purchase of a more expen- 
sive meat item in order to maintain over-all 
satisfaction at a given level. Serving fancy 
meat dishes would augment satisfaction from 
that source but, presumably, not enough to 
compensate for the loss in satisfaction that 


would result from having less money to spend 
for other foods in the same meal or for all 
foods, including meat, at future meals. It 
might be stated as a general rule that meat 
dishes significantly above the regression line 
are good “preference buys” and those below 


the line are not. For example, pot roast of 
beef (No. 8 in Fig. 1) is a good “buy” be- 
cause its level of preference is high in relation 
to its cost. The cost of grilled steak seems 
less justified by its preference since this item 
falls well below the regression line. 

Studies on food monotony (5) and fre- 
quency of serving conducted at the Institute 
have shown that, in general, preference for a 
food falls when its frequency of serving is in- 
creased far enough. This suggests that dishes 
located above the regression line in the pres- 
ent study, which accordingly are good pref- 
erence buys, could be served more often up 
to the point at which preference falls below 
the regression line. At the same time, the fre- 
quency of serving of dishes below the line 


173 


could be adjusted until they rise to the re- 
gression line. Further research is needed to 
determine in a more detailed way how changes 
in frequency of serving will affect preference. 

In projecting possible applications of these 
findings it must be recognized that there are 
restrictions on military buying other than 
budgetary ones. For example, since beef and 
veal are purchased by the carcass, in order to 
take advantage of the lower prices it is neces- 
sary to use the various cuts as they naturally 
occur rather than in the proportions that 
might be most desirable from the standpoint 
of the preference-cost relationship. Again, as 
the largest single buyer of food in the coun- 
try, the Services have a major responsibility 
for helping to maintain the stability of mar- 
ket prices. For example, if the Army decided 
to start buying only certain cuts or grades of 
meat because of their favorable preference- 
cost position, the volume of purchase would 
be so great that prices might rise and wipe 
out the cost advantage. 

Inferences made from any preference-cost 
analysis such as the one presented would have 
to be applied with caution until the effects 
of price fluctuations had been investigated. 
Continuing studies of cost data from month 
to month would be required for secure con- 
clusions. Preference for a given food might 
justify buying it when the price is low but 
not when the price is high. Generally, greater 
flexibility is possible in selecting components 
of the ration other than meats. Thus, it may 
be easier to use preference-cost curves for ad- 
justing their frequencies of serving in order 
to provide soldier consumers with greater 
satisfaction per dollar. 


Other Applications 


The optimization of meal planning is a type 
of preference analysis which can be used with 
other consumer problems. The analysis made 
here is illustrative of concepts and lines of 
analysis which could usefully be pursued 
wherever one has to deal with group behavior 
toward products whose qualitative appeal de- 
pends upon cost. Besides institutional menu 
planning and commercial areas of food proc- 
essing and distribution, this would include 
many consumer necessities and luxuries. The 








174 


results presented illustrate certain fundamen- 
tal ideas which can guide consumer research. 


Received July 26, 1957. 


References 


1. Benson, P. H. A model for the analysis of con- 
sumer preference and an exploratory test. J. 
appl. Psychol., 1955, 39, 375-381. 

2. Edwards, A. L. The scaling of stimuli by the 
method of successive intervals. J. appl. Psy- 
chol., 1952, 36, 118-122. 


Purnell H. Benson and David R. Peryam 


3. Jones, L. V., & Thurstone, L. L. The psycho- 
physics of semantics: An experimental investi- 
gation. J. appl. Psychol., 1955, 39, 31-36. 

4. Peryam, D. R., & Girardot, N. F. Advanced 
taste test method. Food Eng., 1952, 24 (7), 
58-61. 

5. Pilgrim, F. J., & Schutz, H. G. Monotony in 
food acceptance. Amer. Psychologist, 1955, 
10, 503. 

6. Wood, K. R., & Peryam, D. R. Preliminary 
analysis of five Army food preference surveys. 
Food Technol., 1953, 7, 248-249. 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Response Set in Measurement of Food Preference ' 


Howard G. Schutz* and Joe Kamenetzky 


Quartermaster Food and Container Institute, Chicago 


In many surveys of consumer preference, 
respondents are presented with the names of 
products and are asked to indicate their de- 
gree of liking or dislike for each. During the 
years of 1950-1954, surveys of over 30,000 
military personnel were conducted by this In- 
stitute in order to determine the relative pref- 
erences for more than 400 commonly served 
foods. Each respondent rated up to 54 food 
names on a nine-interval scale (1, 2), the in- 
tervals being successively anchored with the 
following descriptive categories: like ex- 
tremely, like very much, like moderately, like 
slightly, neither like nor dislike, dislike 
slightly, dislike moderately, dislike very much, 
and dislike extremely. The scale categories 
were assigned successive integers from 1 to 9 
beginning at the dislike end, and the ratings 
then treated quantitatively. Mean ratings 
fulfilled the major purposes of allowing the 
items to be rank ordered and of guiding the 
menu planners in the selection of foods to be 
served in future meals. 

However, several questions arose in the in- 
terpretation of the survey results. First, in- 
gredients used in, and methods of preparing 
most foods are not constant; variability of 
these factors should be reflected in the vari- 
ability of the mean preference ratings of dif- 
ferent preparations of these items when these 
are actually served. Over time, most indi- 
viduals experience different qualities of serv- 
ings of the same foods, and it is not known 
whether survey respondents evaluate the foods 
in terms of some ideal or idealized experience 
with them, their least favorable experiences, 
or some “average” experiences. The practical 


1This paper reports research undertaken at the 
Quartermaster Food and Container Institute for the 
Armed Forces, and has been assigned Number 781 in 
the series of papers approved for publication. The 
views or conclusions contained in this report are 
those of the authors. They are not to be construed 
as necessarily reflecting the views or indorsement of 
the Department of Defense. 

2Now at Battelle Memorial Institute, Columbus, 
Ohio. 


implications of this problem will be discussed 
later. 

Second, mean taste-test ratings of food 
items prepared and evaluated under labora- 
tory conditions are typically lower than the 
corresponding survey means. This fact sug- 
gests the hypothesis that survey respondents 
evaluate foods as they remember the better 
servings of them. 

The primary purpose of the present study 
was to determine the set or frame of reference 
survey respondents use in rating food names 
according to preference. 


Method 


The respondents were 305 male enlisted 
personnel attending service schools at the 
Great Lakes Naval Training Center, Great 
Lakes, Illinois. As they left the mess hall 
following the noon meal the men were di- 
vided into three groups of approximately 100 
men in each. Each group was administered a 
questionnaire consisting of 54 foods selected 
from previous menus served at this installa- 
tion as representing eight food classes: main 
dishes, desserts, vegetables, soups, beverages, 
potatoes and starches, breakfast foods, and 
breads. The foods were listed in random or- 
der. All groups rated each food on the scale 
described above. The groups were randomly 
given one of three sets of instructions. 

Respondents in the first group (V = 101) 
rated the foods under the usual survey in- 
structions which read, in part: “For each food 
listed in the following pages, circle the reply 
which tells how much you like or dislike that 
food.” On the final page each was asked 
to indicate the quality of food servings he 
thought of when he rated the foods: best, 
better than average, average, poorer than av- 
erage, poorest. 

Members of the second group (NV = 100) 
were asked to rate the “Best Serving” of each 
of these same foods that they had ever eaten, 
and those in the third group (NV = 104) rated 








176 


the “Poorest Servings.” In order to insure 
maintenance of the “Best Serving” and “Poor- 
est Serving” sets, “Rate the BEST (or POOR- 
EST) SERVING of each food,” was printed 
at the top and bottom of each of the six pages 
of the questionnaire. All respondents were 
instructed to circle a “Not Tried” category 
for a food if they thought they had never 
eaten that food. 

It was felt that this method is preferable to 
having each respondent rate the foods under 
all three instructions. The latter procedure 
would likely have had the effect of subtly 
suggesting to the respondents that the rating 
made under the usual survey instructions 
should lie between the ratings made under 
the other two instructions. However, in other 
instances where a specific instruction is not 
the standard being investigated, the exaggera- 
tion of differences between forms of instruc- 
tions may not be serious and may even be 
preferable. 


Results and Discussion 


The detailed results for each of the 54 foods 
are too lengthy to be presented here.* In- 
stead, a summary of the findings is given. 
First, while 29 individual preference means 
are higher in the “Best Serving” group than 
in the normal or survey instruction group, 24 
differences were in the opposite direction. On 
the other hand, with but one exception and 
one tie in means, the means for the normal 
instruction group are higher than the “Poor- 
est Serving” means; and with three excep- 
tions, the means for the “Best Serving” group 
are higher than the “Poorest Serving” means. 
Thus, there appears to be no difference be- 
tween the normal instructions and “Best Serv- 
ing” instructions whereas each differ from the 
“Poorest Serving” set. 

Inspection of Table 1 reveals that these re- 
sults also obtain when we consider the means 
of the food groups and the overall means, 
even though those evaluating the items under 
the normal instructions stated to the final 
question that they had rated in terms of 


8 A three page table giving the mean rating, stand- 
ard deviation, and percentage “Not Tried” for each 
food under each type of instruction has been de- 
posited with the American Documentation Institute. 
Order Document No. 5547, remitting $1.25 for 35- 
mm. microfilm, or $1.25 for 6 by 8 in. photocopies. 


Howard G. Schutz and Joe 


Kamenetzky 


Table 1 


Mean Ratings of Food Groups as 
Function of Instructions 


Instructions 

No. of 
Foods in 
Group 


‘Best “Poorest 
Food Group Normal Servings’ Servings" 


~ 
x 


Main dishes 
Vegetables 
Desserts 

Potatoes & Starches 
Soups 

Breads 

Breakfast Foods 
Beverages 

All Foods 


—— 
233 


AMNANAAN AD 
Zeucuen 


ewe uuncocow | 
VNnem 
PMAN AD AMD 


uses Oe es 
SW Owen OD 


wn 


slightly “better than average” servings. It 
also appears that the percentages of respond- 
ents endorsing the “Not Tried” category for 
any item did not vary as a function of in- 
structions. 

If respondents in surveys do rate in terms 
of what they consider to be “Best Servings,” 
then any departure from this optimum should, 
of course, result in lower ratings. This con- 
clusion would help explain the previously 
mentioned fact that ratings of foods evalu- 
ated in taste tests are almost always rated 
lower than when evaluated during surveys 
since in the latter case the foods are idealized. 

The method and data have uses other than 
serving as aids in interpreting survey ratings. 
Thus, by inspecting the differences between 
“Best Servings” and “Poorest Servings,” we 
can select for further investigation those items 
for which variations of ingredients or meth- 
ods of preparation are of importance in the 
determination of preference. Some foods at 
all levels of preference exhibit large variation 
as a function of instructions. The relatively 
large differences between “Best Servings” and 
“Poorest Servings” for such items having dif- 
ferent levels of preference as roast chicken, 
oven-browned potatoes, and Harvard beets, 
as well as for such food groups as potatoes, 
suggest that these might prove to be more ac- 
ceptable if certain procurement and prepara- 
tion procedures are followed. In contrast, the 
extremely small differences between instruc- 
tions for fresh milk may be due to the high 
level of quality control practiced in the proc- 
essing of this product. 

This general approach of obtaining ratings 
under three instructions might be of practical 





Response Set in Measurement of Food Preference 


value in other situations involving judgments 
in the form of ratings. One example is the 
case where a certain brand of a product (e.g., 
a prepared cake mix) is used by consumers 
under varying conditions. Detection of large 
differences between instructions as a function 
of usage would demonstrate the lack of versa- 
tility of the product and would point to the 
necessity of instituting such corrective meas- 
ures as modifications of the product itself or 
of instructional programs. 

A second case concerns instances where per- 
sons are the objects of assessment. A rater’s 
over-all opinion of someone in and of itself 
gives no estimate of the ratee’s variability, 
either in terms of general performance or on 
any trait such as “initiative.” It is possible 
that much of the inter-rater differences might 
be attributed to the fact that some evaluate 
an individual at his best, some at his worst, 
and others somewhere between these extremes. 
Having the raters perform multiple ratings 
might enable: (a) adjusting the raters’ evalua- 
tions according to their frames of reference; 
(6) detecting those ratees whose perform- 
ances vary markedly. 


Summary 


Over 300 military personnel, assigned ran- 
domly to each of three groups were asked to 
indicate their degree of liking for 54 food 
items belonging to eight food types. Mem- 
bers of the first group rated each item under 
customary instructions, those in the second 
rated the “Best Servings” they ever ate of 
these same foods, and those in the third rated 
the “Poorest Servings.” The results suggest 


_that regardless of food type, food items are 


evaluated in terms of the most favorably re- 
membered experiences with them. Some prac- 
tical implications of the approach used in this 
study are discussed. 


Received September 27, 1957. 


References 


1. Jones, L. V., Peryam, D. R., & Thurstone, L. L. 
Development of a scale for measuring soldiers’ 
food preferences. Food Res., 1955, 20, 512- 
520. 

2. Peryam, D. R., & Pilgrim, F. J. Hedonic scale 
method of measuring food preferences. Food 
Technol., 1957, 11 (9), Supplement, 9-14. 








Journal of Applied Psycholo, 
Vol. 42, No. 3, 1958 sia 


Relations Among Scores on Edwards Personal Preference 
Schedule, California Psychological Inventory, and 
Strong Vocational Interest Blank for an 
Industrial Sample 


Marvin D. Dunnette, Wayne K. Kirchner, and JoAnne DeGidio 
Minnesota Mining & Manufacturing Company, St. Paul 


Two new and relatively recently published 
personality inventories are the Edwards Per- 
sonal Preference Schedule (EPPS) (1) and 
the California Psychological Inventory (CPI) 
(2). Both tests were developed primarily 
for use in counseling and research. The au- 
thors of both tests claim a desire to develop 
measures of relatively normal personality di- 
mensions possessing broader personal and so- 
cial relevance than that possessed by many 
current, more psychiatrically oriented (e.g., 
the MMPI) personality tests. 

The EPPS consists of 210 forced-choice 
items. Each pair of items is matched ap- 
proximately for mean social desirability to 
minimize the effect of the desirability dimen- 
sion on item choice. Items of the test are 
scorable along 15 dimensions: drawn from 
Murray’s (4) system of manifest needs. 
Items measuring any one need are paired 
twice with items measuring each of the re- 
maining 14 needs; hence the maximum raw 
score on any need is 28. This test is an ex- 
ample of a rationally developed personality 
schedule. It is a research and counseling 
instrument which may or may not provide 
measures of Murray’s manifest needs. Cer- 
tainly, it is a thoughtfully developed guide 
and, as such, merits the attention of psycholo- 
gists intent on doing research. 

The California Psychological Inventory is a 
clear example of an empirically developed per- 
sonality test. Methods used in its develop- 
ment essentially are similar to those employed 
with the MMPI. The item format of the 
CPI is similar to that of the MMPI; as a 

1 The EPPS is scored for Achievement (Ach), De- 


ference (Def), Orderliness (Ord), Exhibition (Exh), 
Autonomy (Aut), Affiliation (Aff), Intraception 


(Int), Succorance (Suc), Dominance (Dom), Abase- . 


ment (Aba), Nurturance (Nur), Change (Chg), En- 
aa (End), Heterosexuality (Het), Aggression 
(Agg). 


matter of fact, a large number of the “less 
disturbing” MMPI items are included in the 
CPI. The test contains 480 items, which may 
be scored along 18 dimensions.* The item 
content of most of the scales and the di- 
rection of scoring was determined empirically 
by contrasting responses of groups identified 
(through ratings or other criteria) as pos- 
sessing more or less of a certain broadly de- 
fined personality trait such as Dominance. 

The purpose of the research reported here 
was to compare these two instruments with 
one another and each, in turn, with 11 occu- 
pational area scores obtained from the Strong 
Vocational Interest Blank (SVIB). It is 
hoped that results of these comparisons may 
give added meaning to the various dimensions 
included in these two relatively new person- 
ality tests and also that they may offer clues 
as to the relative utility of these instruments 
in industrial settings. 


Method 


Relationships among SVIB scores, EPPS scores, 
and CPI scores were measured by means of product- 
moment correlations. For this purpose, the follow- 
ing SVIB groupings were employed: 


Human Science = mean of SVIB Group I scores 

Theoretical Technical Science = mean of SVIB 
Physicist & Mathematician scores 

Applied Technical Science = mean of SVIB Engi- 
neer and Chemist scores 

Production Manager = score of SVIB Production 
Manager key 

Skilled Trades = mean of SVIB Group III scores 
Personal Contact (Administrative) = mean of SVIB 


2 The CPI is scored for Dominance (Do), Capacity 
for Status (Cs), Sociability (Sy), Social Presence 
(Sp), Self-Acceptance (Sa), Sense of Well-Being 
(Wb), Responsibility (Re), Socialization (So), Self- 
Control (Sc), Tolerance (To), Good Impression 
(Gi), Communality (Cm), Achievement via Con- 
formance (Ac), Achievement via Independence (Ai), 
Intellectual Efficiency (Ie), Psychological-Minded- 
ness (Py), Flexibility (Fx), and Femininity (Fe). 


178 





Relations Among Scores for an Industrial Sample 


Group V scores on Personnel Director, Public Ad- 
ministrator, YMCA Physical Director, and City 
School Superintendent 

Personal Contact (Service) = mean of SVIB scores 
on YMCA Secretary, Social Science teacher, Social 
Worker and Minister 

Business Detail = mean of scores on office occupa- 
tions of SVIB Group VIII 

Business Management = mean of SVIB scores on 
Banker, Mortician and Pharmacist 

Sales = mean of SVIB Group IX scores 

Verbal = mean of SVIB Group X scores 


The sample on which the correlations are based 
consisted of 102 employees of Minnesota Mining and 
Manufacturing Company including 15 project engi- 
neers, 19 project supervisors, 38 salesmen, and 30 
sales managers. 


Results 


Tables 1, 2, and 3 summarize significant 
correlations (5% and 1% levels) obtained be- 
tween scores from each pair of tests (EPPS 
vs. CPI, EPPS vs. SVIB, CPI vs. SVIB). 

A brief examination of these tables shows 
that the direction of the association among 
the various variables makes good “clinical 
sense.” For example, occupational interests 
directed toward sales, verbal, and personal 


179 


contact activities show positive association 
with scales that might be expected to measure 
social orientation such as Exhibition and 
Dominance on the EPPS and Dominance, 
Capacity for Status, Sociability and Social 
Presence on the CPI. On the other hand, 
scientific interests and skilled trades interests 
tend to be negatively correlated with the 
above personality measures and positively 
correlated with measures more nearly sug- 
gestive of individual effort and idea orienta- 


. tion such as Autonomy and Endurance on 


the EPPS and Psychological-Mindedness and 
Achievement-via-Independence on the CPI. 
Comparison of CPI and EPPS scales gen- 
erally shows correlations in the expected di- 
rections. Positive associations appear between 
CPI scales labeled Dominance, Capacity for 
Status, Sociability, Social Presence, Self Ac- 
ceptance, Responsibility, Self Control, Tol- 
erance, Good Impression, and Achievement 
via Conformance and EPPS scales labeled 
Achievement, Deference, Exhibition, Affilia- 
tion, Intraception, Dominance, and Endur- 
ance. Negative associations with nearly all 
CPI scales are shown by EPPS measures 


Table 1 





CPI 
Scales 


Ach Def 


Ord Exh Aut Int 


EPPS Scales 


Suc , 


ions Between Scales of EPPS and Scales of CPI 


End Het Agg 





Do .20 24 «HW 

Cs . | 21 
Sy —28 35 ; 21 
Sp 26 3 

Sa 29 

Wb 

Re 

So 

Sc 

To JZ 

Gi 

Cm .24 

Ac 

Ai —.24 

Ie — 30 

Py —.24 

Fx —.23 

Fe .23 


— 26 
—.25 





Note.—All the above coefficients are significant at the 5% level; the ones in italics are significant at the 1% level. 








Marvin D. Dunnette, Wayne K. Kirchner, and JoAnne DeGidio 


Table 2 
Significant Correlations Between Scales of EPPS and Scales of SVIB 








EPPS Scales 


SVIB Groups Ach Def Ord Exh Aut Aff Int Suc Aba Nur Chg End Het Agg 











Human Sci. 26 —.23 
Theoretical 
Tech. Sci. 20 —.23 
Applied Tech. 
Science : .22 
Prod. Manager 
Skilled Trades 
Outdoor —.22 
Pers. Contact 
(Adm.) 
Pers. Contact 
(Service) 
Business 
(Office Det.) —.20 
Business 
Management 
Sales Jl r a7 
Verbal- 
Linguistic 27 —.23 S35 23 —.25 —.24 26 





Note.—All the above coefficients are significant at the 5% level; the ones in italics are significant at the 1% level. 


Table 3 


Significant Correlations Between Scales of CPI and Scales of SVIB 














SVIB Scales 





Per. Per. Business 
Prod. Skilled Cont. Cont. Off. Bus. 
Mgmt. Trades Admin. Service Det. Mgmt. Verbal 





Cs II 26 

Sy ; ‘ —.3] 42 43 P 31 
Sp ‘ ? —.25 29 26 d Jl 
Sa : ; ‘ —.37 ‘ 4 
Wb , .20 

Re 4 

So 

Sc ; 

To 47 

Gi Jl 

Cm 

Ac 

Ai .20 

Te d 

Py ; 35 d : —.21 

Fx 25 —.23 

Fe 





Note.—All the above coefficients are significant at the 5% level; the ones in italics are significant at the 1% level. 





Relations Among Scores for an Industrial Sample 


labeled Order, Succorance, Abasement, Het- 
erosexuality, and Aggression. 

These findings are in good agreement with 
results reported by Merrill and Heathers (3) 
showing the EPPS measures of Succorance, 
Abasement, and Aggression to be related posi- 
tively to “unhealthy” profiles on the MMPI. 

Since the CPI Good Impression scale is 
more or less a combination of the MMPI L 
and K scales, we also may derive an inference 
concerning so-called “faking” tendencies on 
the EPPS. Apparently, efforts (either con- 
scious or unconscious) to create a good im- 
pression on EPPS ordinarily will be accom- 
panied by elevated scores on Achievement, 
Deference, Affiliation, Intraception, Domi- 
nance, and Endurance and by depressed 
scores on Succorance, Heterosexuality, and 
Aggression. 

To a great extent then, the correlations 
shown among scales in this study are reflec- 
tions of tendencies to be dominant, confident, 
and sociable on the one hand as opposed to 
tendencies toward permissiveness, depend- 
ency, and individualistic activities on the 
other. Scales designed to measure sales, 
verbal, personal contact interests, and domi- 
nance, sociability, achievement motivation, 
social presence, and the like show highest 
positive intercorrelations. 


181 


It is evident that industrial use of these 
two tests probably should be restricted to 
situations in which counseling on vocational 
guidance is the major purpose. Their use as 
instruments to aid in the selection and place- 
ment of job applicants will be greatly en- 
hanced with the accumulation of information 
on the effects of test-taking attitudes and pos- 
sible “faking” tendencies on scores derived 
from them. In the meantime, these tests, 
used together, and in combination with meas- 
ures of vocational interests and ability can be 
useful aids in the individual career guidance 
and counseling of industrial employees. 


Received August 5, 1957. 


References 


1. Edwards, A. L. Manual for the Edwards Per- 
sonal Preference Schedule. New York: Psy- 
chological Corp., 1954. 

2. Gough, H. G. Manual for the California Psy- 
chological Inventory. Palo Alto: Consulting 
Psychologists Press, 1957. 

3. Merrill, R. M., & Heathers, Louise B. The rela- 
tion of the MMPI to the Edwards Personal 
Preference Schedule on a college counseling 
center sample. J. consult. Psychol., 1956, 20, 
310-314. 

4. Murray, H. A. Explorations in personality: A 
clinical and experimental study of fifty men 
of college age. New York: Oxford Univer. 
Press, 1938. 








Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Output Rates Among Coil Winders 


Harold F. Rothe and Charles T. Nye 
Fairbanks, Morse and Company, Beloit, Wisconsin 


A series of previous papers has shown that 
the output rates, or production, of various 
groups of industrial employees tends to be 
relatively inconsistent from one period of time 
to another. It has been hypothesized that 
this inconsistency might be a function of the 
incentivation, or lack of incentivation, in the 
various situations. It has also been suggested 
that this inconsistency is not the same thing 
as low “reliability”; rather, that output is it- 
self a phenomenon deserving study. “The 
proper subject for the study of industrial out- 
put is industrial output itself.” 

One study revealed different daily work 
curves from one day to another rather than a 
“typical daily work curve” (1). A second 
study showed a low correlation between the 
average production for one two-week period 
compared with the average production for the 
following two-week period. There was no 
financial incentive system in operation in that 
plant (3). The third study showed a higher 
correlation between the average production of 
one week compared with the average produc- 
tion of the next week, covering a period of 16 
weeks, in a plant that did have a financial in- 
centive system (4). Even in this latter situa- 
tion, however, the week to week consistency 
was lower than the consistency commonly de- 
scribed in textbooks (5). 

In the present study, data were again taken 
from the official books of a manufacturing 
concern and the week to week consistency for 
a group of employees was determined. The 
ratio of interindividual differences and the 
ratio of intra-individual differences were also 
obtained. These two measures, the consist- 
ency and the ratios, were analyzed in the 
light of the hypotheses previously put forth. 


Background of the Study 


The data used here were taken from the 
books of a Midwest manufacturing plant. 
They cover a group of 27 employees and a 
period of 38 successive weeks from June 1956 


to March 1957. (Actually 39 weeks were 
covered but one week in December was 
omitted because the plant closed for inven- 
tory.) The employees were mainly women 
and all were experienced on their jobs. There 
were no “learners” in the group. Although 
there were some slight variations in the jobs 
they fall into three basic jobs described in the 
U.S.E.S. Dictionary of Occupational Titles as 
Coil Winder, 6—99.014, Rotor-Coil Winder, 
6~-99.112, and Stator-Coil Winder, 6—99.131. 
All employees in the plant, including those 
involved in this study, were members of a 
national union under a union-shop contract. 

There was no financial incentive system in 
effect. There had been one, but it had been 
removed about five years earlier. The em- 
ployees were performing their regular jobs in 
their regular workplaces, and each employee 
governed her own work pace. (No moving 
belts, no long machine runs, etc.) The data 
were used for each week in which the em- 
ployee worked 32 or more hours. Thus, from 
time to time, the size of the sample dropped 
below 27 employees. However, in no week 
were there fewer than 21 employees. 


Main Findings 


The weekly average output for the group, 
and also the number of employees whose data 
were used in this analysis for each week, is 
shown in Table 1. Inspection of this table 
shows there was an increase in performance 
early in the period studied and that the group 
performance later stabilized at a plateau. It 
is noteworthy that there was a change in de- 
partmental foremen at the beginning of this 
study. A forelady who had previously super- 
vised this department but who had been trans- 
ferred to another department was transferred 
back to the Coil Winding Department at the 
time that happened to be selected for this 
study. Output climbed immediately upon her 
return and stabilized again at the high level 


182 





Output Rates Among Coil Winders 


it had reached previously when this forelady 
was supervising operations. Although this 
improved production is undoubtedly a tribute 
to this forelady, it is also an uncontrolled 
variable in this study. It is doubtful if the 
rise in productivity affected the results of this 
study, but this does indicate the difficulties 


Table 1 


Weekly Average Output (Percentage Performance of 
for Group of Coil Winders 


Number of 
Employees 
Weekly 


Percentage 


Week Ending Performance 





June 17 76.1 27 
24 76.2 27 

July 1 78.9 27 
8 80.6 21 

15 78.6 27 

22 82.2 27 

29 84.7 27 

5 83.3 25 

12 83.7 26 

19 88.0 26 

26 78.5 24 

September 2 85.3 24 
9 25 

16 26 

26 


November 


February 





Table 2 


Frequency Distribution of r’s Between Successive 
Week’s Output. Individual Performance 
for Group of Coil Winders 








r Frequency 





.91-1.00 1 
81- .90 7 


.71— .80 
.61— .70 
51- .60 
Al-— .50 
31-— 40 
21— 30 
11— .20 
O1— .10 
—.09 .00 


Note.—Median r = .64. 


involved in attempting to do scientific re- 
search in an industrial situation. 

The correlation of each employee’s perform- 
ance for one week with his or her performance 
for the following week was determined by the 
method of Pearsonian r. The distribution of 
the obtained r’s is shown in Table 2. The 
median r is .64; the highest r is .91 and the 
lowest r is — .03. Thus it is concluded that 
the week to week output was not particularly 
consistent, and also that there was an ex- 
tremely large variation in consistency. This 
latter point is important for psychologists at- 
tempting to validate tests (or other activities) 
against production data. It shows the need 
for taking production data over a fairly long 
period of time. If a psychologist happened to 
select the two weeks correlating .91 he would 
undoubtedly be most happy, and if he hap- 
pened to select the two weeks correlating 
— .03, he would be most unhappy. 

The greatest and least amount of produc- 
tivity for each employee for any one of the 
38 weeks is shown in Table 3, together with 
the ratio of best to worst performance of each 
employee. The average (median) ratio of 
best to worst performance or intra-individual 
ratio is 2.24. 

The ratio of best operator to worst operator 
for each week—the interindividual ratio—is 
shown in Table 4 where the average (median) 
ratio is 2.06. Thus the average ratio of the 





184 


range of intra-individual performances ex- 
ceeds the average ratio of the range of inter- 
individual performance. This was also true 
in the study of butter-wrappers who were also 
working under nonincentive conditions (2). 
But the opposite was true (i.e., the average 
ratio of the range of inter-individual perform- 
ance exceeded the average ratio of the range 
of intra-individual performance) in the study 
of chocolate-dippers who, perhaps by no co- 
incidence, were on a financial incentive sys- 
tem (4). 

It is also perhaps important to note that the 
ratios found in this situation were much 
larger than the ratios found in the study of 
chocolate-dippers (although smaller than the 


Table 3 


Highest and Lowest Average Weekly Performances, 
and Their Ratios, for Individual Coil Winders 
During 38-Week Period 








Highest 
Weekiy 
Average 


Ratio of 
Highest 
to Lowest 


Lowest 
Weekly 


Employee Average 





2.51 
7.20 
3.73 
1.63 
2.09 
2.62 
2.66 
3.33 
2.24 
1.69 
2.31 
1,52 
3.16 
2.02 
1.85 


142 81 
114 61 
106 46 
107 41 


A 
B 
Cc 
D 
E 
F 
G 
H 
I 
J 
K 
L 
M 
N 
O 
P 
Q 
R 
Ss 
T 
U 
V 
W 
xX 
Y 
Z 
AA 





Note.—Median intra-individual ratio = 2.24. 


Harold F. Rothe and Charles T. Nve 


Table 4 


Highest and Lowest Average Individual Weekly Per- 
formances, and Their Ratios, for Group of 
Coil Winders During 38-Week Period 








Ratio of 
Highest 
to Lowest 


Highest 
Employee’s 
Average 


Lowest 
Employee's 


Week Ending Average 





June 17 97 51 
24 98 49 
July 102 33 
110 56 
111 46 2.41 
105 59 1.78 
105 4.04 
119 47 2.53 
114 67 1.70 
180 50 
116 54 
September 2 118 49 
9 115 58 
114 56 
23 142 63 
30 154 43 
October 7 198 57 
14 115 37 
21 113 
28 117 28 
November 4 151 55 
127 71 
117 74 
105 69 
December 108 59 
116 44 
126 61° 
134 71 
144 58 
119 63 
118 51 
121 61 
111 59 
114 73 
116 43 
115 57 
117 68 
116 71 


1,90 
2.00 
3.09 
1,96 





Note,——Median interindividual ratio = 2,06, 


ratios found in laboratory studies). In that 
study, the median interindividual ratio was 
1.475 and the median intra-individual ratio 
was 1.18, as contrasted with the 2.06 and 2.24 
found here, respectively. 





Output Rates Among Coil Winders 


Other Findings 


Since the correlation of output from one 
week to the next week was so low, (7 = .64) 
the data were combined in various ways to 
determine the effect of using longer periods 
of time. The most obvious combination was 
to split the data—to correlate the average 
production of each of the 27 operators for the 
first 19 weeks with their average production 
for the second 19 weeks. The Pearsonian r 
is .71 which is low for a work sample of this 
size. 

Another r was obtained using the average 
production of each operator for the four week 
periods of greatest plant employment (i.e., 
when the number of employees in the plant 
was greatest). The employees whose output 
data were used in this study formed only a 
part of one department in a very large plant. 
Here, as probably almost everywhere, the 
grapevine carries stories of increasing or de- 
creasing sales and corresponding rises and 
falls in employment. Thus it was believed 
that there may be some relationship between 
output and size of the plant labor force. 
There were two peaks of employment in the 
period covered by this study. Production 


data for the four weeks leading up to and in- 
cluding each of the two peaks were used. The 
peaks were about six months apart from each 


other. The obtained r = .25 which suggests 
a lack of common variables influencing output 
during those two periods. 

Along the same line of reasoning, the aver- 
age output for each operator during a five- 
week period of decreasing employment was 
correlated with the five-week average during 
a period of increasing employment. The ob- 
tained r = .60. The writers suspect that the 
r of .60 found here, and the r of .25 in the 
preceding paragraph are merely chance varia- 
tions. 

The average production for the entire group 
of 27 operators for each week was correlated 
with the size of the total plant labor force for 
that week, and the resulting r= — .39. This 
means that as the employee force decreases 
the average production of these coil winders 
increases, and vice versa. Although this cor- 
relation of — .39 is statistically significant at 
between the 1% and 5% levels, it should be 


185 


realized that it is an indication from one de- 
partment in one plant. Other data from other 
situations are needed before much meaning 
can be attached to these data. 

The correlation between total weekly plant 
employment and weekly output variance of 
these coil winders is — .02; between number 
of total plant’s employees on layoff and av- 
erage production of these coil winders, r = 
— .03; between total number on layoff and 
variance of coil winders production, r = .02; 
between number employed in coil winding de- 
partment and average production of these 27 
operators, r = .07; and between number em- 
ployed in this department and variance of 
output of these 27 operators, r = — .16. All 
of these correlations are, of course, insignifi- 
cant. 

Discussion 


It has been hypothesized that “. . . the in- 
centives to work may be considered ineffec- 
tive when the ratio of the range of intra-indi- 
vidual differences is greater than the ratio of 
the range of inter-individual differences” (2, 
p. 326). In the present situation, where there 
was no financial incentive system, the intra- 
individual ratio did exceed the interindividual 
ratio. And in a previous study, with no in- 
centive system in effect, this same relation- 
ship between the ratios of inter- and intra- 
individual differences was found, while in a 
situation where an incentive system was in 
effect, the opposite relationship was found. 
The hypothesis is clearly not proven by this 
study, but these various studies do seem to 
point clearly toward a relationship between 
incentivation and inter- and intra-individual 
differences. 

A second hypothesis was “if the intercorre- 
lation of output rates for two periods closely 
related in time is less than .50, the incentiva- 
tion is not highly effective, while intercorrela- 
tion higher than .80 indicates very effective 
incentivation” (4, p. 96). The present facts 
are generally consistent with this hypothesis, 
but they vary in amount (or size of coeffi- 
cient). In the light of the present study this 
hypothesis is now changed to say that an in- 
tercorrelation of .80 or above indicates effec- 
tive incentivation and an intercorrelation of 
.70 or less indicates ineffective incentivation. 








186 


This leaves a twilight zone of between .80 and 
.70 that needs clarification from further re- 
search. (It also tempts one to speculate on 
the chaos that might exist if a negative or in- 
significant intercorrelation were to exist! ) 

The output data were correlated with vari- 
ous other variables such as size of employee 
force and number of employees on layoff, but 
the obtained r’s were insignificant. 

Grouping the weekly output data into 4, 5, 
and 19 week periods and correlating the data 
for these longer periods did not increase the 
r significantly over the r for single weeks’ out- 
puts correlated. 

This study, along with the other output 
studies (1, 2, 3, 4) again shows that produc- 
tion data cannot be picked up casually and 
used to validate tests or other procedures. In 
this entire series of studies of industrial out- 
put the most striking single result is the lack 
of consistency from time to time, especially 


Harold F. Rothe and Charles T. Nye 


when there is no financial incentive system in 
operation. A second important result is the 
wide range of “consistency coefficients” of 
output data, such that a researcher could be 
entirely misled by tests of statistical signifi- 
cance if he just happened to select a period 
of unusually high or low consistency. 


Received September 9, 1957. 


References 


. Rothe, H. F. Output rates among butter wrap- 
pers: I. J. appl. Psychol., 1946, 30, 199-211. 

. Rothe, H. F. Output rates among butter wrap- 
pers: II. J. appl. Psychol., 1946, 30, 320-327. 

. Rothe, H. F. Output rates among machine op- 
erators: I. J. appl. Psychol., 1947, 31, 484- 
489. 

. Rothe, H. F. Output rates among chocolate dip- 
pers. J. appl. Psychol., 1951, 35, 94~97. 

. Tiffin, J. Industrial psychology. New York: 
Prentice-Hall, 1942. 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Dimensions of Work Satisfaction in the Occupational 
Choices of College Freshmen * 


Alexander W. Astin 
US Public Health Service Hospital, Lexington, Kentucky 


One limitation of the standard interest in- 
ventories as aids in vocational counseling is 
the heavy emphasis in these tests upon in- 
trinsic sources of work satisfaction. Students 
who seek vocational counseling often appear 
to be motivated primarily by desires for so- 
cial prestige, economic rewards, etc., rather 
than by an interest in specific work activities. 
Several studies (1, 5, 6) have demonstrated 
significant relationships between such extrin- 
sic aspects of work satisfaction and the voca- 
tional choices of college students. 

The purposes of this study were: (a) to de- 
termine some of the motivational variables 
underlying both intrinsic and extrinsic as- 
pects of work satisfaction; and (6) to relate 
these variables to the vocational choices of 
college freshmen. 


Procedure 


The initial task was to develop a set of items to 
represent a wide variety of potential satisfactions 
from work. Ginzberg and associates (4) reported 
three general areas of anticipated work satisfaction 
from their sample: monetary and prestige rewards; 
intrinsic satisfactions; and “concomitant” satisfac- 
tions (from the physical and social working environ- 
ment). These three categories served as general 
guideposts in the selection of items. Specific item 
material was culled from the literature on occupa- 
tional choice and also from a questionnaire which 
was given to the staff of the University of Maryland 
Counseling Center. A preliminary set of 22 items 
was given to four experienced psychologists to evalu- 
ate according to the following criteria: clarity in 
wording; relative independence of meaning; lack of 
obvious emotional content; relevance to the poten- 
tial occupations of college students; and comprehen- 
siveness within each of the Ginzberg categories. 

A final set of 21 items was developed for study. 
This was administered to 355 male college freshmen 
who were asked to respond to each item on a seven- 


1 This article is a condensation of the author’s doc- 
toral dissertation, completed at the University of 
Maryland in 1957 under the direction of John W. 
Gustad. This dissertation, under the same title, is 
on file in the University of Maryland Library, Col- 
lege Park, Maryland. 


point Desirability scale. In order to reduce compu- 
tational labor, a representative sample of 200 Ss was 
selected at random (within colleges) from the origi- 
nal 355 freshmen. Pearson correlations among the 
item scale scores for these 200 Ss were computed, 
yielding a matrix of 210 intercorrelations. 

A “B-coefficient” cluster analysis (3, pp. 12-17) 
was performed on the matrix of item intercorrela- 
tions. In order to test the relationships between the 
obtained clusters and the students’ occupational 
choices, 196 of the original 355 freshmen were dis- 
tributed according to their stated vocational choices 
into the following nine criterion groups: accounting, 
engineering, farming, managerial, medicine-dentistry, 
persuasive, sales, teaching, and undecided. These 
criterion groups, it should be pointed out, repre- 
sented the claimed choices of the Ss and did not rep- 
resent their measured interests. After a weighting 
system for obtaining cluster scores was decided upon, 
analyses of variance were performed separately for 
each cluster on the nine criterion groups. Intercor- 
relations among the cluster scores were also com- 
puted. 


Results and Discussion 


The intercorrelations among the 21 work 
satisfaction items tended to be low, ranging 
from — .293 to + .430. Nevertheless, 61 of 
the correlations surpassed the .05 level of sig- 
nificance and 31 surpassed the .01 level. 

The cluster analysis produced four fairly 
distinct clusters. These clusters, together 
with paraphrases of their respective items, 
will be presented and discussed separately. 
(In order to facilitate discussion, the terms 
“cluster” and “trait” will henceforth be used 
interchangeably.) 


Cluster I—“ Managerial-aggressive” 
Items 


. Control other employees. 

. Influencing, persuading others. 

. Working under stress. 

. Taking orders (negative). 

. Expressing personal ideas and feelings. 
. Keeping very busy. 

. Frequent change of duties. 


187 





188 


This cluster suggests a very aggressive, in- 
dependent, and perhaps hyperactive person. 
The Ss scoring high on this cluster would be 
expected to tend to choose occupations re- 
quiring an aggressive, dominant role, such as 
sales, managerial, and persuasive. 


Cluster II—“Status-need” 


Items 


1. High salary with uninteresting work. 

2. High salary with uncertain success. 

3. Recognition by others with uninterest- 
ing work. 

4. Frequent travel. 

5. Living in a large city. 


Cluster II appears to measure the extent to 
which a person is concerned with the eco- 
nomic and social prestige rewards which ac- 
crue from his work. Since the first three 
items in this trait are somewhat similar in 
content and wording, it could be argued that 
the cluster reflects primarily similarities in 
construction, rather than different aspects of 
a status motivation in the students. The pres- 
ence in the cluster of the final two items, how- 
ever, supports the latter interpretation, in that 


the stereotype of the wealthy, famous person 
is compatible with such characteristics as 
“traveling out of town frequently” and “liv- 
ing in a large city.” 


' Cluster I1I—“Organization-need” 


Items 


. Making verifiable judgments. 

. Attention to accuracy. 

. Working at a set time schedule. 

. Having co-workers with similar interests. 
. Working closely with others. 

. Keeping very busy. 


The first three items in this cluster indicate a 
need to structure and organize both the work 
and the working schedule. The fourth item 
hints at a need for a homogeneous (struc- 
tured?) working environment, while the fourth 
and fifth items together suggest a dependency 
need. The last item was not included in the 
B-coefficient analysis, because it did not cor- 
relate significantly with items 4 and 5; it did, 
however, correlate significantly with the first 





Alexander W. Astin 


three items. Ss scoring high on this trait 
would be expected to choose occupations re- 
quiring scientific training, e.g., engineering, 
and medicine-dentistry. 


Cluster IV— 
Items 


1. Working indoors. 

2. Physical activity (negative). 
3. Living in a large city. 

4. Frequent change (negative). 


The absence of obvious meaning in this bi- 
polar cluster precluded assigning it a name or 
making a confident interpretation regarding 
its psychological significance. Even though 
the intercorrelations were entirely consistent 
with respect to sign, they tended to be some- 
what smaller than those in the first three 
clusters. 

At this point, some comparisons might be 
made between the first three clusters and the 
original classifications of the Ginzberg group. 
The only Ginzberg category whose items re- 
mained in a single cluster was “monetary- 
prestige.” These items occurred only in Clus- 
ter IT (i.e., Items 1, 2, 3). This same cluster, 
however, also contained items representing 
“concomitant” satisfactions. Cluster I con- 
tained only “intrinsic” items, while Cluster 
III contained both “intrinsic” and “concomit- 
ant” items. It thus appears that none of the 
Ginzberg categories, with the possible ex- 
ception of “monetary-prestige rewards,” was 
actually reproduced in the cluster analysis. 

To obtain trait scores for each of the Ss, 
decisions had to be made regarding the items 
to be included and the weighting system to 
be employed. (Cluster IV was not included 
in the validity analysis, since it was felt that 
any significant results would not be readily 
interpretable.) In order to remove as much 
error as possible from each cluster without 
destroying its essential meaning, the items 
which produced marked drops in the B co- 
efficient were excluded from the clusters. 
Thus, the last two items in each of the first 
three clusters( see above) were excluded. 

Since items of this nature will tend to 
weight themselves primarily in terms of their 
individual variabilities, a unit weighting sys- 
tem was employed. The “trait score” for 








Occupational Choices of College Freshmen 


Table 1 
Analysis of Variance for Trait 1 
(Managerial-Aggressive) 
Source of 
Variation f MS F 


88.333 
18.913 


Occupations 
Error 


4.6704* 


Total 


*p < 01, 


each individual consisted, therefore, of the 
sum of his Desirability scale scores for the 
items retained in the cluster. Item 5 on 
Cluster I, which correlated negatively with 
the other items, was scored in reverse. That 
is, a person with a scale score of 7 (very de- 
sirable) would be assigned a weight ef 1 and 
so on. 

The variances of the nine occupational 
groups on Trait I (Managerial-aggressive) 
were compared by Bartlett’s Test and found 
to be homogeneous. The asalysis of variance 
on Trait I (Table 1) produced an F ratio of 
4.6704, which was found to be significant 
(p < .01). Students choosing sales, persua- 
sive, and managerial occupations obtained the 
highest scores on this trait, while occupation- 
ally undecided students and those desiring to 
work as farmers, accountants, and engineers 
received the lowest scores. 

Trait II (Status-need) was also tested by 
the analysis of variance (Table 2) and the 
obtained F ratio was not found to be signifi- 
cant (p > .05). 

The analysis of variance empioying Trait 
III (Organization-need) (Table 3) produced 
an F ratio of 1.4564, which was not signifi- 
cant (p> .05). In order to test the pro- 


Table 2 
Analysis of Variance for Trait II (Status-Need) 





Source of 
Variation 





Occupations 
Error 


Total 





Table 3 


Analysis of Variance for Trait ITT 
(Organization-Need) 


Source of 

Variation df MS F 
Occupations 8 
Error 


18.496 
12.622 


1.4654* 


Total 


*p > .05. 


posed interpretation of this trait more di- 
rectly, the nine criterion groups were reclas- 
sified into two larger samples, Science and 
Nonscience. This was accomplished simply 
by combining the medicine-dentistry and en- 
gineering groups into one sample (Science) 
and the other seven groups into another sam- 
ple (Nonscience). This decision was made 
before actually computing the means for the 
original nine subgroups. 

The means and standard deviations on 
Trait III for the Science and Nonscience 
groups are presented in Table 4. The vari- 
ances were found to be significantly different 
(p < .05), precluding the use of a conven- 
tional ¢ ratio to test the means. Employing 
a technique described by Edwards (2, pp. 
167-168), a ¢ ratio of 2.7378 was computed. 
This surpassed the value of ¢ needed to reject 
the null hypothesis in the absence of assump- 
tions about the population variance. It thus 
appears that the two groups did differ signifi- 
cantly with respect to both central tendency 
and variability. This finding is consistent 
with the previous interpretation of Trait ITI, 
in that students who choose occupations re- 
quiring scientific training would be expected 
to show more need for organization than stu- 
dents selecting nonscientific careers. 


Table 4 


Means and Standard Deviations on Trait III for 
Science and Nonscience Students 





Nonscience 
(m = 123) 





11.74 
3.74 








Alexander W. Astin 


Table 5 
Intercorrelations of the Trait Scores for the 196 Ss 








I II Ill 





I 02 02 
II 02 





The engineering and medicine-dentistry 
groups were ranked first and third, respec- 
tively, among the original nine subgroups on 
Trait III. The second highest scorers on this 
Trait. were the accountants. The occupation 
of accounting, although not requiring exten- 


sive scientific training, can, nevertheless, be . 


characterized as a highly structured form of 
work. It must be remembered, however, that 
the F ratio comparing these original sub- 
groups was not significant, so that post-hoc 
speculations of this nature are, at best, highly 
tentative. 

None of the intercorrelations among the 
first three traits was significantly different 
from zero, indicating statistical independence 
among the Ss’ scores on each trait. The rela- 
tions among the motivational dimensions rep- 
resented by these traits would have to be de- 
termined with more reliable measurements. 

The results of this study suggest the im- 
portance of examining the vocational choice 
process in terms of both extrinsic and in- 
trinsic sources of work satisfaction. Further 
exploration of these and perhaps other traits, 
using a more comprehensive set of items and 
a variety of criteria, might provide an em- 
pirical basis for a generalized theory of voca- 
tional decision-making. In addition, knowl- 
edge of such traits and their relevance to the 
vocational choice process would be of consid- 
erable use to the practicing counselor in his 
attempts to understand and predict the be- 
havior of his clients. 


Summary 


1. This study attempted to determine some 
of the variables underlying various aspects of 
work satisfaction and also to relate these vari- 
ables to the occupational choices of male col- 
lege freshmen. 

2. Twenty-one items, presumed to cover the 
three areas of work satisfaction proposed by 
Ginzberg and associates, i.e., intrinsic, mone- 





tary-prestige, and concomitant, were selected 
for study. These were administered to 355 
male college freshmen, who were asked to re- 
spond to each item on a seven-point Desir- 
ability scale. Intercorrelations among the 
scale scores were obtained from a representa- 
tive subsample of 200 of these Ss. A cluster 
analysis of the matrix produced four clusters 
of items, three of which were interpretable. 

3. Cluster I (Managerial-aggressive) ap- 
peared to represent a need to dominate in in- 
terpersonal relations and to control the be- 
havior of others in the working situation. 
The Ss selecting careers in sales, managerial, 
and persuasive occupations obtained the high- 
est scores, while vocationally undecided Ss 
and Ss desiring to work as farmers and engi- 
neers obtained the lowest scores. 

4. Cluster II (Status-need) was interpreted 
as a concern with the monetary and social 
prestige outcomes of work. This cluster was 
not found to be significantly related to the vo- 
cational choices of the Ss. 

5. Cluster III (Organization-need) ap- 
peared as a desire to structure and organize 
both the work and the job environment. The 
Ss selecting occupations requiring scientific 
training scored significantly higher on this 
cluster than Ss choosing nonscience careers. 

6. None of the original Ginzberg cate- 
gories, with the possible exception of “mone- 
tary-prestige” (Cluster IT), was actually re- 
produced in the cluster analysis. 

7. None of the intercorrelations among the 
three clusters was significantly different from 
zero. 


Received September 23, 1957. 


References 


1. Dickinson, C. Ratings of job factors by those 
choosing various occupational groups. J. 
counsel. Psychol., 1954, 1, 188-189. 

. Edwards, A. L. Experimental design in psycho- 
logical research. New York: Rinehart, 1950. 

. Fruchter, B. Introduction to factor analysis. 
New York: Van Nostrand, 1954. 

. Ginzberg, E., Ginsburg, S. W., Axelrad, S., & 
Herma, J. L. Occupational choice. New 
York: Columbia Univer. Press, 1951. 

. Hammond, Marjorie. Motives related to voca- 
tional choices of college freshmen. J. counsel. 
Psychol., 1956, 3, 257-261. 

. Ziller, R. C. Vocational choice and utility for 
risk. J. counsel. Psychol., 1957, 4, 61-64. 








Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


Gain in Proficiency as a Criterion in Test Validation ' 


Winton H. Manning and Philip H. DuBois 


Washington University, St. Louis, Missouri 


In the validation of selection tests, final 
grade in a training course is often the cri- 
terion. Because of practical difficulties, on- 
the-job performance has been used less fre- 
quently. A third relevant criterion has been 
generally overlooked, namely, gain-in-pro- 
ficiency or improvement through training. 

Under commonly existing conditions, final 
grade may not adequately represent the per- 
formance of students in a training course. 
Consider a situation in which: (a) trainees 
differ in their initial level of performance, 
that is, their prior education and experience 
has led to diversity in pretraining proficiency ; 
and (6) the training curriculum does not 
ordinarily result in mastery of the job but 
rather develops skills fundamental to efficient 
learning on-the-job. In this training situa- 
tion, improvement or gain-in-proficiency may 
constitute a more significant dimension of 
student performance than does final standing. 

If we should decide, on rational grounds, 
that a gains criterion is relevant to our train- 
ing situation, there still remains the problem 
of measuring improvement. Past researches 
have typically used as the measure of im- 
provement a crude gain score, that is, the 
simple arithmetic difference between scores 
on pretraining and post-training proficiency 
measures. However, when learning is meas- 
ured by crude gain scores, gain appears to be 
not only uncorrelated with aptitude measures 
but also uncorrelated with other measures of 
gain (3). There is reason to suspect that this 
apparent unrelatedness of gain scores is de- 
rived from a peculiarity of the crude gain 
measure itself. 

Recently DuBois (1) described an applica- 
tion of correlational analysis to the measure- 
ment of improvement which overcomes many 

1 Based on a paper read September 4, 1957, at the 
meeting of the APA and prepared under Contract 
Nonr 816(02) between Washington University and 
the Office of Naval Research. Opinions expressed 
are those of the authors and are not to be construed 


as representing the endorsement of the Department 
of the Navy. 


disadvantages of the crude gain measure. 
Residual gain, which is advocated as prefer- 
able to crude gain, is defined as that portion 
of the measure of final status which is not 
correlated with initial status. Specifically, in 
terms of z scores, a residual gain score (221), 
represents the difference between actual final 
proficiency (z2), and final proficiency pre- 
dicted from initial status (7r;22;). 

The purpose of this study was to compare 
the predictability of three criteria of student 
proficiency. Two questions were of particular 
interest: (@) Of three criteria (crude gain, 
residual gain, and final status), which is most 
correlated and which least correlated with 
each of several selected aptitude measures? 
(5) As compared with more conventional 
tests, will measures obtained from a simple 
learning task contribute unique variance to 
the prediction of gain or final proficiency in a 
complex technical skill acquired in a class- 
room training situation? 


Procedure 


The Ss were 213 trainees from 13 successive 
classes in the aircraft ignition phase of the 
U.S. Navy Aviation Machinists Mates’ School 
(Advanced). All were enlisted naval person- 
nel with ratings of second class petty officer 
or above. 

On the first day of training, one form of a 
comprehensive examination in aircraft ignition 
was administered as the pretraining pro- 
ficiency measure. This test consisted of 60 
five-choice, multiple-response questions. Fol- 
lowing three weeks of training in aircraft ig- 
nition an alternate form of this achievement 
test, also 60 items in length, was administered 
as the regular final examination. These two 
tests, one administered on the first day and 
another on the last day of training, served as 
bases for computing two gains scores: residual 
gain and crude gain. Scores on the second 
test alone were used as the measure of final 
standing in the course. 


191 





“SUSPYUOS JO [PAI JOO" IB IWEIYIUBIS » 
*ZUISPYUOD JO [PAI [Q" Ie JULIYTUBIS gq 
"2OUSPYWOD JO [PAI GO" IE JULIYTUBIS « 





a 60° (LTDA) 3891 BurusveT 
uo ured [eNpIsay * 

Or ‘ : ; or : (LTA) 389, Burasvay 
$a109g [BU], [eULy * 

Ma : ‘ az i? a or (LID 389, 3urareaT 
Sa10dg [VU fentuy * 

ts ; ; obf” sv iW (HOAW) 
yal [eorURyy * 

oF . ze 097" or 7 (Trav) 
say WHeuyUY * 

lz og" £9" se i 62° tT" iv 80° (LOD) 82L 
UOTVIYISse[.) [e1aUas) * 





uwW LION LW LI HOUW WV Loo Snyeig [eUly ures) Tenpisey = Urey) apni $10}9 Ipod 


$10}91pIg eua}uD 














(¢1Z = N) 
SI{GeUBA 1O}IIPaig JO SUOIZVIAIG] Plepueig pue suvayy ‘suOIyYja1I09I93U] ‘SaNIpYyeA 


T qe 


< 
Ss 
& 
= 
Q 
z= 
= 
S 
= 
Q 
—) 
= 
S 
i) 
= 
™ 
= 
= 
= 
= 
= 
Ss 
~ 
~ 
ad 
= 








Proficiency as a Criterion in Test Validation 


Three Navy Basic Battery Tests and the 
DuBois-Bunch Learning Test (2) served as 
predictors. The General Classification Test 
(GCT) may be regarded as a test of ver- 
bal ability. The Arithmetic Test (ARI) con- 
sists of items involving arithmetic computa- 
tion and problem solving. The Mechanical 
Test (MECH) is made up of items involv- 
ing mechanical and electrical knowledge and 
comprehension and application of mechanical 
principles. The DuBois-Bunch Learning Test 
is a simple perceptual learning task, adapted 
for group administration. It consists of ten 
90-second trials with a 30-second rest pause 
between trials. 


Results and Discussion 


Table 1 presents the validities, intercorre- 
lations, means, and standard deviations for 
the six predictor variates. The standard error 
of an r of .00, for an N of 213, is .069. 
Mean performance on the proficiency tests 
increased from 22.7 items correct for the pre- 
test, to 47.2 for the post-test. The standard 
deviation of pretest scores was 5.51; for post- 
test scores it was 5.89. The correlation be- 
tween pretest and post-test scores was .41. 


Navy Basic Battery Test scores are standard 
scores, based upon a normative sample, with 
means of 50 and standard deviations of 10. 
Scores of Ss in our sample averaged slightly 
higher than these theoretical values and also 


exhibited some restriction. However, none of 
the correlations in Table 1 was corrected for 
restriction in range or for unreliability of 
predictors. 

Split-half reliabilities of the three criteria 
were .56 for crude gain, .67 for residual gain 
and .77 for final status. Because the criteria 
differ in terms of reliability, comparisons of 
their relative predictability might be obscured. 
The italicized correlations found in Table 1 
are validity coefficients of predictors for cri- 
terion measures, reliabilities of which have 
been corrected to unity. 

Inspection of the validities contained in the 
left-hand portion of Table 1 indicates differ- 
ences in the predictability of the three cri- 
teria. Correlations of predictors with the 
crude gains criterion were generally quite low. 
In only one instance did the obtained validi- 


Table 2 
Multiple Prediction of Three Criteria of 


Student Performance 


Residual 


Gain 


Final 
Status 


Crude 


Predictors Gain 


GCT, ARI, MECH 14 .38* .50* 
ILT, FLT 14 .28* 35* 
GCT,ARI, MECH,ILT, FLT .17 40* 52° 
GCT, ARI, MECH, RGLT 16 39 Rvs 
GCT, MECH, FLT 5 39" we 


* Significant at .001 level of confidence. 


ties differ from zero sufficiently to be sta- 
tistically significant. Correlations of pre- 
dictors with the residual gains criterion were 
lower than those of final status, but all were 
of sufficient magnitude to be significant at the 
01 level or beyond. Correlations of final 
status with predictors were all highly signifi- 
cant. Furthermore, inspection of corrected 
validities indicates that these differences can- 
not be wholly attributed to differences in the 
reliabilities of the criteria. 

Another comparison of the predictability of 
the three criteria may be made by inspection 
of Table 2. Multiple correlations between 
each of the three criteria and various com- 
binations of predictors were computed, and 
the significance of these multiple correlations 
then tested by means of analysis of variance. 
None of the multiple correlations of crude 
gain was significant. All multiple R’s with 
residual gain were significant beyond the .001 
level. This was also true, of course, for 
multiple correlations with final status. 

Another question concerned the hypothe- 
sis that measures derived from the DuBois- 
Bunch Learning Test would increase signifi- 
cantly the multiple correlation obtainable 
using only the Navy Basic Battery Tests. It 
was felt that measures of performance from 
a simple learning task might increase signifi- 
cantly the prediction of gain in a training 
course beyond that obtainable by more con- 
ventional aptitude tests. The increase in the 
multiple correlation, .02 at best, was not sig- 
nificant. It is of interest to note, however, 
that one of the best predictors of final stand- 
ing and of residual gain in the training course 








194 


was the final trial score on this simple learn- 
ing task. 


Summary 


A decision concerning which criterion, gain 
or final grade, should be adopted in a par- 
ticular training situation rests primarily upon 
logical considerations. However, in correlat- 
ing aptitude measures with final grade, it is 
quite possible that overlap of nonvalid vari- 
ance, such as verbal facility and test-wiseness, 
may in some situations lead to spuriously high 
validity coefficients. 

In contrast to this, residual gain represents 
the portion of the post-training measure which 
is uncorrelated with initial status. As a con- 
sequence, some of the nonappropriate variance 





Winton H. Manning and Philip H. DuBois 


may have been removed from the criterion. 
In this sense, a criterion of residual gain for 
test validation may be more realistic than the 
more frequently adopted criterion of final 
standing, while at the same time avoiding 
inconsistencies encountered when the crude 
gain measure is used. 


Received September 25, 1957. 


References 


. DuBois, P. H. Multivariate correlational analy- 
sis. New York: Harper, 1957. 

. DuBois, P. H., & Bunch, Marion E. A new tech- 
nique for studying group learning. Amer. J. 
Psychol., 1949, 62, 272-278. 

3. Woodrow, H. The ability to learn. 
Rev., 1946, 53, 147-158. 


Psychol. 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


A Note on the Reliability and Validity of the Minnesota 
Scale for Paternal Occupations as an Estimate of 
Family Economic Status 


John L. Holland 


National Merit Scholarship Corporation 


The purpose of this note is to report tests 
of the interobserver reliability of the Minne- 
sota Scale for Paternal Occupations (1) and 
its validity as an estimate of family economic 
status or father’s net income. The data were 
obtained in the execution of the National 
Merit Scholarship program (2) and represent 
approximately a 20% sample of 4000 Final- 
ists in the 1955-56 program whose parents 
filled out Parents Confidential Statements for 
the College Scholarship Service, Princeton, 
New Jersey. 

To estimate interobserver reliability, two 
judges classified independently fathers’ occu- 
pations for a sample of 200 forms filled out 
by both parents. The percentage of agree- 
ment between judges is 73.5, i.e., judges clas- 
sify 73.5% of the occupational titles in the 
same scale interval. The product-moment 
correlation between raters is .84 for an N of 
150. Cases which were not coded by one or 


both judges because of lack of sufficient oc- 
cupational information are excluded from this 
correlation. 

The validity of the Minnesota scale as a 
measure of family financial status is shown 
in Table 1 by income distributions for each 
class or scale interval. Income in this study 
is “net income” taken from the Parents Confi- 
dential Statement which is an elaborate finan- 
cial summary of family income, insurance, 
expenses, income tax, etc. No published stud- 
ies exist which document the relationship be- 
tween this reported net income and independ- 
ent estimates made through credit checks, but 
authorities feel that these statements have 
substantial accuracy based on their experience 
with limited numbers of credit checks and the 
internal checks existing within the form itself.* 


1In a personal communication, William R. Reed, 
Assistant Commissioner of Athletics, Big Ten Inter- 
collegiate Conference, reports that parent income 


Table 1 


IT 
Semi- 
prof. 


IV 
Farmers 


Clerical 


Income skilled 


1,000 : 8 s 14 
3,000 19 ? 12 
5,000 63 3 2 
7,000 57 f 2 
9,000 3: 39 
11,000 17 
13,000 7 
15,000 10 
17,000 6 
22,000 3 
23,000 1 
25,000+ 2 


Totals 
Median 


232 
7,911.7 


118 
6,262.7 


30 


9,370.9 3,166.1 





Vv VI 
Semi- Slightly 
skilled skilled 


VII 
Day 
workers 


Un- 


class. Totals 





4 51 

10 14 135 
9 35 
1 24 
17 


113 21 8 
5,424.5 4,899.5 3,666.1 


105 
5,970.9 


798 
7,023.5 





195 





196 


In Table 1, the rank-order correlation be- 
tween the scale values and median incomes is 
.78. Only Class IV, Farmers, is out of ex- 
pected order. This discrepancy may occur 
since the computation of farmer income is in 
several ways not comparable with the incomes 
of other occupational groups or because Class 
IV should be ranked lower in the scale. Larger 
samples which are more representative of the 
general population are needed to explore these 
hypotheses. 

The extreme variations in income within 
classes shown in Table 1 reveal that class 
membership is an unreliable measure of indi- 
vidual income. For studies employing large 
statements possess a high degree of accuracy when 
evaluated by an independent credit check. A review 
by Dun and Bradstreet of every 25th case from a 
sample of 1275 statements reveals that 50 of 51 
statements are accurate reflections of family finan- 
cial status. 

Corroborative comment has been received also 
from Richard G. King, Associate Director of Ad- 
missions and Financial Aid, Harvard College; Rex- 
ford G. Moon, Jr., Director, College Scholarship 


Service; and John U. Munro, Director, Financial 
Aid Center, Harvard College. 





John L. Holland 


samples, however, groups of classes appear 
useful as measures of mean income. For ex- 
ample, a cutting point of $7,000 or more in- 
cludes 62% of the persons classified in Classes 
I-III and only 16% of the persons classified 
in Classes IV-VII. 


Summary 


As a set of classes, the Minnesota Scale for 
Paternal Occupations appears useful as a 
crude index of fathers’ net income for group 
purposes. The classification process itself has 
relatively high interobserver reliability for a 
sample of two judges. 


Received September 25, 1957. 


References 


1. National Merit Scholarship Corporation. First 
Annual report, 1955-1956. Evanston, Illinois: 
Author. 

2. University of Minnesota, Institute of Child Wel- 
fare. The Minnesota Scale for Paternal Oc- 
cupations. Minneapolis: Author. 





Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


A Hierarchical Factor Analysis of Foreman Behavior * 


John A. Creager and Francis D. Harding, Jr. 


Air Force Personnel and Training Research Center 


In recent years several factor analyses of 
supervisory behavior have been accomplished 
(2, 4, 7). The chief intent of these studies 
has been to develop techniques for evaluating 
supervisory behavior. The information ob- 
tained from such analyses should provide a 
clearer picture of the practices of supervisory 
behavior. Most such studies involve the col- 
lection of ratings on a number of traits where 
the raters are at a higher supervisory level 
than the ratees. The structure of such rat- 
ings is liable to be somewhat obscured by 
“halo” effects; consequently, a form of fac- 
tor analysis which explicitly separates “halo” 
from the pertinent behavior dimensicns is de- 
sirable. Such a method of factor analysis has 
recently become available and is known as 
the hierarchical model (5). 

The purpose of this article is to illustrate 
the application of the hierarchical model to 
the analysis of intercorrelations developed 
from a check list of foreman behavior. 

The check list was designed to measure six 
aspects of foreman behavior: Human Rela- 
tions, Job Instruction, Planning and Control, 
Policy and Procedure, Technical Job Knowl- 
edge, and Personal Characteristics. Since 
these areas were arbitrarily chosen as being 
logical divisions of foreman performance, it 
was decided to factor analyze the Foreman 
Check List to determine the independence of 
these components. 


Procedure 


Sample. One hundred forty-one foremen from 23 
companies were rated by their immediate supervisors. 
The foremen rated were engaged in supervising pro- 
duction type activities. The purpose of the ratings 
was to obtain research data and not to provide a 
basis for administrative action. 

The foreman check list. Each of the 81 statements 
of the check list was rated in terms of how well it 

1 The computational work for this study was car- 
ried out under the Air Force Personnel and Training 
Research Center in support of Project 7719, Task 
17050. Permission is granted for reproduction, trans- 
lation, publication, and use or disposal in whole or 
in part by or for the United States Government. 


described the performance of the foreman being 
evaluated. The following scale was used: Strongly 
characteristic of his performance, Moderately char- 
acteristic of his performance, Slightly characteristic 
of his performance, and Does not apply to this fore- 
man’s job. The ratings were weighted 3, 2, 1, and 0, 
respectively, in scoring the check list. 

The factor analysis. For the purposes of the fac- 
tor analysis, the check list statements were divided 
into 18 variables made up of four or five statements 
each. Each of the six aspects of foreman behavior 
was represented by three such variables. The nu- 
merical ratings on the statements making up each 
variable were averaged over nonzero ratings to ob- 
tain the foreman’s score on each variable. Pearson 
product-moment correlations were computed among 
the 18 variables. 

The factor analysis model chosen for analysis of 
the correlation matrix was the hierarchical model re- 
cently proposed by Schmid and Leiman (5). This 
choice was based on considerations of the pertinent 
features of this model, in relation to control of 
“halo” effects possibly obscuring the sought dimen- 
sions of foreman behavior. Due to the recent publi- 
cation of the model, applications will not be found in 
the technical literature; hence it is necessary to make 
a brief digression into the pertinent features of the 
model and how it may be applied. Readers inter- 
ested in the mathematical details of the model may 
consult the original article. 

The hierarchical factor model is a generalization of 
the multiple group and Holzinger bi-factor methods 
discussed in several basic factor analysis texts (1, 3, 
6). One starts with a hypothesis regarding the num- 
ber and definition of the primary factors. This hy- 
pothesis may come from behavior theory, from re- 
view of prior studies, or may be generated by a 
preliminary examination of the correlation matrix. 
The six categories grouping the 18 variables discussed 
in the previous section of this paper constitute one 
such hypothesis. On the basis of the hypothesis 
proposed, the multiple group method of factoring is 
applied to the obtained correlations up to the point 
where the intercorrelations among the primary fac- 
tors (matrix ¢:) are obtained. This matrix of first- 
order primary factor correlations is examined for 
plausibility of the original hypothesis. If some of 
the correlations among primary factors are extremely 
high, the original grouping hypothesis may be re- 
jected and a new one generated involving fewer, less 
correlated groups. The first-order analysis yields di- 
rectly the primary structure, S:, which is the matrix 
of correlations of each variable with each group fac- 
tor. The multiple-group analysis also yields the 
group intercorrelations, ¢:. By computing the in- 








John A. Creager and Francis D. Harding, Jr. 





00 


II] SHsya}IeIeYyD [euosieg BI 
I] SHsHo}IVIVYD [euOsIag {| 
I SoNsHa}IVIeYD [eUOsI9g OT 
III ®3pajmouy qof jeoruyoay, ¢} 
II 23paymouy qof yeoruqoay, F] 
I a8paymouy gof [eoruqoay, ¢] 
III 2anpeseig 2 Adod ZI 
I] aanpace1g 3 AOd 11 
J aanpacoig % A2Od OL 
III [02}40>  Suyuurig 6 
I] [01}U0D 2 Buyuug ¥ 
J Jos}U0D ® Suyuuyyg / 
III uornsnsysuy qof 9 
II woronsysuy qof ¢ 
I wornonsysuy qof 
II] suonepy ueuwnpy ¢ 
II Suonepy uvUN Z 
] suonepy ueuny | 





a 


is 





S[eNpIseYy [Bury PUe SaqeULA ISI] YoyD uBuIOy SuowY suOTeTaLI00I9;U] 
T 9I9®L 





Factor Analysis of Foreman Behavior 


verse of ¢ and premultiplying this inverse matrix by 
the structure matrix, the primary pattern matrix, P,, 
is formed. This matrix contains the projections of 
each test on each factor. The factors will usually be 
oblique. In the rare situation in which they are 
orthogonal, the hierarchical model is irrelevant. 

The process described in the preceding paragraph 
is then repeated on the first-order factor correlation 
matrix, ¢:, to yield a second-order solution consist- 
ing of matrices Se, 2, and Ps. This process is con- 
tinued in the hierarchical model until one obtains a 
Spearman-type, or single common factor matrix, ¢x, 
where k& is the “order” of the solution. It is unlikely 
that & will exceed 3 in a well-designed factor ana- 
lytic experiment. The hierarchical factor matrix is 
then obtained by matrix multiplication procedures. 
The final result is an orthogonal factor matrix con- 
sisting of a general factor, some large-group factors 
(if & is greater than 2), and several small-group fac- 
tors corresponding to those originally hypothesized. 
This factor matrix may be examined for plausibility 
of the original hypothesis and for interpretability. 
If & equals 2, the Holzinger bi-factor pattern emerges 
as a special case. Rotations may be applied where 
indicated; the introduction of rotations implies a re- 
jection of the original hypothesis and a search for 
an improved fit between the model and the data. 
However, this does not apply when the rotations are 
of a “clean-up” nature and the interpretations of the 
factors are not changed. 


199 


Before proceeding with application of the hier- 
archical model to the intercorrelations of the Fore- 
man Check List ratings, it may be noted that the 
hierarchical model achieves an orthogonal solution 
expressing all levels of factor relationship that may 
exist in the data. All levels of relationship may be 
expressed explicitly in a single factor matrix. In ap- 
plications to correlations developed from ratings, the 
separation of “halo” variance from other sources of 
variation, in which one is usually more interested, 
may be achieved by this model. This is the reason 
for its use in the present study. 


Results and Discussion 


The matrix of intercorrelations among the 
18 variables is presented in Table 1. The ap- 
plication of the model to the data of the 
present study was initiated by replacing the 
unit diagonal elements of the original correla- 
tion matrix with communalities estimated by 
Thurstone’s Formula 15 (6, p. 300). The six 
categories of variables discussed in the previ- 
ous section formed the initial groups defining 
multiple-group factors. Correlations among 
first-order primary factors ranged from .52 to 
.92, with a median r of .75. Although it 


Table 2 


First-Order Solution in Hierarchical Analysis of Foreman Check List 


Structure S; 
First Order Factor 


Variable 7 IIT 


Pattern P, 
First Order Factor 


I II Il IV 





787 450 
749 408 
658 
855 
897 523 
878 
813 608 
665 





739 216 —015 —092 
920 —009 — O84 003 
&34 — 205 098 090 
028 839 —096 
152 820 
—179 969 128 
— 292 681 627 
— 650 695 


730 685 882 —015 158 
737 546 599 529 231 
737 408 662 199 Beas 
760 608 638 010 605 
389 494 842 567 — 100 080 
505 585 892 732 —054 —007 
505 559 634 152 —072 
553 623 669 799 — 202 181 
577 627 688 894 — 206 045 
597 551 515 764 221 — 205 


Note.—Decimal points have been omitted. Pattern values for summed-over variables in italics. 








John A. Creager and Francis D. Harding, Jr. 


Table 3 


First-Order Factor Correlations and 
Second-Order Residuals 





I IT 





I 1.000 
me, 879 
Til .538 
IV 735 


1,000 
.630 
752 


Note.—First-order factor correlations, ¢:, in lower left por- 
tion of the matrix; second-order residuals in upper right portion 
of the matrix. 


would be possible to proceed, later reducing 
the number of factors by residualizing rota- 
tions, it is more efficient at this stage to reject 
the six-factor hypothesis. This was done and 
the first-order solution repeated with four hy- 
pothesized factors defined by summing over 
the following clusters: 


I. Variables 1, 2, 3 
II. Variables 4, 5, 6 
III. Variables 13, 14, 15 
IV. Variables 9, 17, 18 


The resulting primary structure, S,, and pri- 
mary pattern, P,, for the first order of the 
hierarchical solution are presented in Table 2. 
The first-order factor correlations, ;, are pre- 
sented in Table 3, with the residuals from 
second-order factoring. 

In the second-order solution, it was initi- 
ally hypothesized that the first-order primary 
factor correlations could be accounted for by 
a single factor and the Spearman formulas 
fitted to the data. Although the fit was fairly 
-Close, difficulty was experienced in reproduc- 
ing the original correlation matrix, R, from 
the bi-factor solution resulting from this 
hypothesis. Postulating two second-order fac- 
tors reduced both first- and second-order re- 
siduals more nearly to zero. The second- 
order solution was iterated to stabilize com- 
munality estimates for the ¢; matrix. Two 
second-order factors were defined by summing 
over: 


A. First-order Factor I, II 
B. First-order Factor III, IV. 


The resulting primary structure, S2, primary 
pattern, P., and correlation between the two 


Table 4 


Second-Order Factor Solution of the 
Foreman Check List 


Second-Order 
Structure S2 


Second-Order 
First Pattern P, 
Order F 

Factor : A B 


912 —.010 
.962 O11 
— 043 815 
043 917 


Note.—-Second-order factor correlation of .818 for ran con- 
stitutes the off-diagonal element of ¢:; diagonal elements are 
unity until reduced for the third-order solution. 


second-order factors are presented in Table 4 
for the second-order of the hierarchical solu- 
tion. 

The third-order solution, factoring 2, re- 
quires a single common factor with corre- 
sponding unique factor loadings for each of 
the second-order factors. This third-order 
factor matrix, Es, with both common and 
unique factors is used to rotate the second- 
order factor matrix by premultiplication of E, 
by Pe, the second-order primary pattern. The 
unique factors are appended to the matrix re- 
sulting from this product, thus forming matrix 
Ey». The hierarchical solution, E;, is then ob- 
tained by premultiplying E, by P,, the first- 
order primary pattern. Matrices Es and E2 
are presented in Table 5. Matrix E,, the 
hierarchical solution, except for uniqueness 
loadings, is presented in Table 6. 


Table 5 


Matrices E; and E2 in Development of the 
Hierarchical Solution 


Us h? 
818 





A427 





Ge A 


815 389 
880 = A111 
698 —.018 
869 = «018 














Factor Analysis of Foreman Behavior 


Examination of the hierarchical factor 
matrix E, reveals a few indicated rotations 
which might improve the fit between the 
model and the data. Five orthogonal rota- 
tions were made by the Zimmerman graphic 
method (8). This resulted in residualization 
of one of the orders of the hierarchy and indi- 
cates that a better original grouping would 
have yielded a bi-factor pattern, such as re- 
sulted from the rotations. In retrospect, the 
authors realize that the first-order factor cor- 
relations indicate a regrouping on a three- 
factor hypothesis. It was decided to present 
the actual operations used to illustrate how 
the method would be used if higher orders 
are required and to illustrate how an error of 
judgment can be rectified without redoing the 
entire analysis. The final rotated factor 
matrix is presented in Table 7. 

Residuals at each order of factoring are pre- 
sented in the upper right portion of each order 
correlation matrix, those in Table 1 being the 
final residuals computed from the interpreted 
factors of the rotated solution. Some evi- 
dence of a weak factor defined by Variables 
10, 11, and 12 exists in these residuals. In 


201 


the original six factor grouping, the group de- 
fined by these variables was highly correlated 
with that defined by Variables 4, 5, and 6. 
The group yielding the residual cluster was 
not retained. The size of the residuals in this 
cluster are within acceptable limits for re- 
siduals, but the fact that they do form a 
small meaningful cluster indicates need for 
further investigation to ascertain whether such 
a factor can be confirmed in future experi- 
ments. 

Interpretation. The general factor repre- 
sents “halo” effects plus other sources of fac- 
tor intercorrelations in the rating data. The 
remaining three nonresidual factors yield a 
good approximation to orthogonal simple 
structure. The first group factor, defined by 
the first six variables, those in the Human 
Relations and Job Instruction categories, de- 
fine a Social Relations factor in the foremen 
behavior as rated by the Check List. Also 
loaded on this factor are the variables deal- 
ing with communications of Policy and Pro- 
cedures. The second group factor is defined 
exclusively by the three Technical Job Knowl- 
edge variables. The third factor is defined 


Table 6 


Hierarchical Factor Matrix E; 


Variable ; i B 


—043 
—032 
065 
001 
—027 
039 
205 
284 
306 
024 
058 
083 


CONIA NE WHE! 


—006 


II 
052 
—002 
—049 
201 
196 
232 
163 
166 
038 
055 
106 
145 


—010 
039 
190 


226 
—034 
112 
008 


227 


004 


—033 
060 
—027 
216 
297 
247 


281 
349 
275 
327 
399 
289 


—043 
—023 
065 
— 087 
—O88 
095 


019 580 
487 
561 
084 
026 

—051 


666 
608 
706 
752 
652 


765 
669 
824 
584 


— 044 


O18 —049 


Note.—Decimal points have been omitted; complete E: matrix would include uniqueness loadings not shown here. 





John A. Creager and Francis D. 


Harding, Jr. 


Table 7 
Rotated Factor Matrix 








Variable 


G’ I’ 


Il’ IIT’ IV’ hee 





583 
546 


621 
645 
528 
AYO 
509 
377 
239 
030 
221 
504 
405 
345 
074 
103 
178 
104 
074 
205 


712 
730 
773 
818 
766 
770 
585 
O44 
674 
531 
645 
566 
701 
747 
599 


CONAN EWNHe 


— 046 
—008 

041 
—038 
—018 


—025 
—062 
082 
060 
—040 
—012 
006 
—025 
142 
124 
—125 
150 
645 
585 
626 
206 
182 
064 


— 044 
—109 756 
614 
731 
814 
789 
826 
799 
794 
620 
620 
604 
712 
806 
768 
669 
824 
583 


135 
129 
170 
290 
420 096 
—086 
—041 

030 

066 
—091 
— 136 
— 146 
—069 
—117 
—170 


—044 


—003 
134 
002 
344 
461 
375 





Variance 8.030 
62.17 


% of nonhalo common — 


2.425 
18.78 
49.63 


% of total common 


18.62 


910 
7.05 


1.321 
10.23 
27.04 


.230 
1.78 
4.71 


12.916 
100.01 
100.00 





Note.—Decimal points have been omitted. 


* Communalities computed including two smaller residual factors, not shown here. Sum of communalities so computed is 
13.070. Percentage variance computed on the basis of reported factors. 


by Planning and Control, and Personal Char- 
acteristics variables. Examination of the items 
labeled “Personal Characteristics” reveals that 
these characteristics involve the foremen’s own 
administrative behavior as supervisor. This 
factor is designated as an Administrative 
Skills factor. 

Examination of the relative amounts of 
variance on these factors accounting for the 
intercorrelations among the 18 rated foremen 
characteristics shows 62.17% of the common 
variance being attributable to “halo.” The 
remaining common variance is neatly struc- 
tured into three interpretable components. 
The Social Relations component accounts for 
18.78% of the total common variance, or 
49.63% of the nongeneral common variance; 
the Technical Job Knowledge component ac- 
counts for 10.23% of the total common vari- 
ance, or 27.04% of the nongeneral common 
variance. The Administrative Skills compo- 
nent accounts for 7.05% of the total com- 


mon variance, or 18.62% of the nongeneral 
common variance. 


Summary 


Ratings of industrial foremen were made 
using a check list. Scores on 18 variables rep- 
resenting six hypothesized aspects of super- 
visory behavior were factor analyzed using 
the hierarchical factor model. A detailed ex- 
planation of the application of the hierarchi- 
cal model is given. In terms of the present 
data, four factors were found, one a general 
or “halo” factor and three interpretable fac- 
tors. These were described in terms of their 
meaning and variance. The factors were 
designated as Social Relations, Technical Job 
Knowledge, and Administrative Skills and are 
similar to several previously reported. 

It is concluded that the hierarchical factor 
model is a useful technique for the analysis of 
intercorrelations of trait ratings. 


Received October 14, 1957. 





Factor Analysis of Foreman Behavior 203 


References 


. Fruchter, B. Introduction to factor analysis. 
New York: Van Nostrand, 1954. 

. Grant, D. L. A factor analysis of managers’ rat- 
ings. J. appl. Psychol., 1955, 39, 283-286. 

. Holzinger, K. J., & Harmon, H. H. Factor analy- 
sis. Chicago: Univer. Chicago Press, 1941. 

. Roach, D. E. Factor analysis of rated supervisory 
behavior. Personnel Psychol., 1946, 9, 487- 
498. 


5. Schmid, J., & Leiman, J. M. The development of 
hierarchical factor solutions. Psychometrika, 
1957, 53-61. 

6. Thurstone, L. L. Multiple-factor analysis. Chi- 
cago: Univer. Chicago Press, 1947. 

7. Wilson, R. C., High, W. S., Beem, H. P., & 
Comrey, A. L. A factor-analytic study of 
supervisory and group behavior. J. appl. Psy- 
chol., 1954, 38, 89-92. 

8. Zimmerman, W. S. A simple graphical method 
for orthogonal rotation of axes. Psycho- 
metrika, 1946, 11, 51-55. 








Journal o ay ta proctetvas 
Vol. 42, 


A Machine Method of Computing Guttman’s Coefficient of 
Reproducibility with a Large Sample 


Robert W. Heath 


Division of Educational Reference, Purdue University 


Despite various criticisms and suggested 
modifications, Guttman’s Coefficient of Re- 
producibility (1, 2, 4, 5) is probably the most 
widely accepted criterion for the scalability of 
a set of items. 

A number of computational methods (3, 6) 
have been developed for this coefficient. How- 
ever, these have generally been limited to 
sample sizes of a maximum of 200 and the 
computational procedures for large numbers 
remain extremely cumbersome and time con- 
suming. 

The method presented here has shown itself 
to permit the computation of Coefficient of 
Reproducibility on samples of 1,000 in ap- 
proximately one-half hour each. The method 
requires that (a) the items’ rank order in the 
scale are known and (0d) that the items’ al- 
ternatives are dichotomous. If the items are 
scalable, or nearly so, the frequency of re- 
sponse to their positive alternatives is an ade- 
quate index of their rank order. Other meth- 
ods of determining item order are given in 
the literature (1, 6). 

This procedure requires the use of a card 
sorter and a tabulator. The response pattern 
of each respondent should be recorded in a 
separate card, the response to each item oc- 
cupying a separate column. For convenience, 
it is desirable that the items be recorded in 
their rank order. 

Assuming that the response patterns are so 
recorded and that the items are punched from 
left to right in rank order with the “most 
positive” item placed in Column 1, it is then 
possible to sort the cards into scale types. 

Consider the following response patterns 
from a five-item perfect scale: 


Column 
Respondents 


We can treat the response patterns as if 
they were five-digit numbers composed of 1’s 
and 0’s only. The response patterns then 
appear: 

Column 
Respondents 


28 ¢% 


1 

2 2 i 8 
100 0 0 
2-2-3 8 @ 
74:0 8 
.ee 8 


Thinking of the response patterns in this 
way it is a matter of five passes through the 
sorter to arrange the individuals in “numeri- 
cal” or scale type rank. In ascending order 
they would be: 


1/0 0 0 0 
1110 0 0 
11 1/0 0 
tee | |0 
2 ab. 

This is the familiar scalogram of a perfect 
scale. Thus the procedure sorts all perfect 
scale types together and in the proper order. 
By listing the cards with the tabulator, the 
scalogram is formed. 

If we add a response pattern containing an 
error, the scalogram would appear: 


10 0 0 

0 0 0 (error pattern) 
0 ? 0 
|o 0 
1 1 |0 
ie ax 


1 
1 
1 
1 
1 
1 


1 | 
1 
1 
1 


The errors stand out in a manner which 
draws attention to the fact that the pattern 
in which they occur must be separately scored. 

When the response patterns have been as- 
signed their proper scores, they are then 


204 





Computing Guttman’s Coefficient of Reproducibility 


sorted in descending score order and errors 
can be counted with relative ease. 


Summary 


A method of computing Guttman’s Coeffi- 
cient of Reproducibility is described which re- 
quires: (@) as many passes through a sorter 
as there are items, () one pass on a tabu- 
lator, (c) one pass for every 10 items on a 
sorter and (d) a final run on the tabulator. 
On the average, the coefficient for an N of 
1,000 can be computed in one-half hour. 


Received October 16, 1957. 


References 


. Guilford, J. P. Psychometric methods. (2nd ed.) 


New York: McGraw-Hill, 1954. 

Guttman, L. The Cornell technique for scale and 
intensity analysis. Educ. psychol. Measmt, 
1947, 7, 247-279 


. Guttman, L. Mathematical and tabulation tech- 


niques. Soc. Sci. Res. Council Bull. No. 48, 
251-364. 


. Guttman, L. The basis for scalogram analysis 


In S. A. Stouffer, et al., Measurement and pre- 
diction. Princeton, N. J.: Princeton Univer. 
Press, 1950. Pp. 60-90. 


. Menzel, H. A new coefficient for scalogram analy- 


sis. Publ. Opin. Quart., 1953, 17, 268-280. 


. Riley, M. W., Riley, J. W., & Toby, J. Sociologi- 


cal studies in scale analysis. New Brunswick: 
Rutgers Univer. Press, 1954. 





Journal 5 Ra te? Psychology 
Vol. 42, No. 3, 1958 


A General Device Versus More Specific Devices for 
Selecting Car Salesmen 


James E. Kennedy ‘ 


Bureau of Industrial Psychology, University of Wisconsin 


In reviewing the literature of the past 30 
years concerned with salesmen selection it ap- 
peared that the validity of a device for select- 
ing salesmen is a highly specific thing. Not 
only might the validity be specific for “spe- 
cific” types of sales jobs, such as life insur- 
ance selling compared to retail department 
store selling, but it also might be specific to 
groups of salesmen within these relatively 
specific types of sales jobs, e.g., shoe sales- 
men opposed to hardware salesmen at a re- 
tail department store. Bills (3) working with 
the Strong Vocational Interest Blank in pre- 
dicting success of insurance agents, noted dif- 
ferences in validities for different groups of 
salesmen based upon: (a) previous experience 
compared to no previous experience at selling 
life insurance, (5) age, and (c) preference for 
secondary occupations. Others (1, 2, 5, 9, 
10) have reported similar results. 

A number of authors (4, 6, 7, 8) have sug- 
gested, upon reviewing research done on the 
selection of salesmen, that consideration of 
the nature of each type of sales job should 
yield improvement in validity. 

In general, the purpose of this study was 
to pursue this suggestion and to evaluate the 
relative worth of developing specific selection 
devices for subvarieties of retail car salesmen 
compared with a single selection device for re- 
tail car salesmen in general. More specifically, 
the purpose of the study was: (a) to develop 
an over-all selection instrument for use with 
retail car salesmen in general, (6) to develop 
several specific selection instruments for use 
with six particular subvarieties of car sales- 
men within this group, and (c) to examine the 
validity of each of these instruments in order 
to evaluate the relative worth of constructing 


1 This study was conducted when the author was 
a member of the Personnel Evaluation Services staff 
at General Motors Institute. He extends his thanks 
to Orlo L. Crissey for permission to use the data and 
to his staff members for their many suggestions. 


206 


over-all versus specific selection instruments 
for use with car salesmen. 


Predictors 


The predictors used in the study were 290 
multiple-choice type items. These items may 
be classified with regard to their content as: 
personal or biographical data, personality, in- 
terest, and attitudes. Some of the items were 
modifications of items which had been found 
in previous studies to be related to successful 
job performance; others were original items 
based upon suggestions obtained from field 
interviews with experienced personnel from 
the retail and wholesale sales organizations of 
a large automobile manufacturer for whom 
the study was conducted. 

Under the circumstances, 290 items seemed 
to be too many to administer to a single sales- 
man, hence the items were divided and two 
forms of a questionnaire prepared. These 
trial forms of the questionnaire were referred 
to as Forms A and B. Those items surviving 
the item-analyses to be described below were 
used to compile Form C which was cross- 
validated. 


Population and Samples 


This research was conducted for two divi- 
sions of a large automotive corporation. For 
our purposes, these divisions will be referred 
to as Divisions A and B and their products as 
Car A and Car B. Each division distributes 
its products through independently owned and 
operated retail dealerships located through- 
out the country. 

Three samples of dealerships were selected 
on a random-stratified basis to give equal rep- 
resentation of geographical areas and car 
makes within each sample. These samples 
were referred to as Samples I, II, and ITI. 
Form A of the questionnaire was adminis- 
tered to salesmen in Sample I and Form B to 
those of Sample II to provide the data for 





Selecting Car Salesmen 


item analyses. Form C was administered to 
Sample III for cross-validation. Salesmen 
from all samples were currently employed 
when administered the questionnaire. 

The questionnaires were mailed to the 
dealership where they were distributed to the 
salesmen. Background material and instruc- 
tions were included in the questionnaire; 
salesmen completed the questionnaires with- 
out supervision. Only if a salesman had com- 
pletely filled in the questionnaire and if his 
dealer had forwarded the necessary criterion 
information was the questionnaire usable for 
analysis. The following numbers of ques- 
tionnaires were usable: Form A, 358; Form 
B, 335; Form C, 749. 


Criterion Analysis 


Five measures of job performance were con- 
sidered initially, these were (each for a stand- 
ard period of time): number of cars sold, 
gross dollar sales volume, gross dollar earn- 
ings, gross profit returned to the dealer as a 
result of the salesman’s efforts, and super- 
visory ratings. 

It was not possible to assume that there was 
equal opportunity to sell from one dealership 
to another. Also preliminary investigation 
showed that new-car salesmen averaged con- 
sistently higher earnings than used-car sales- 
men at the same dealerships. As a result, the 
following method for collecting criterion meas- 
ures was used. All new-car salesmen within 
a dealership were ranked on each criterion 
measure. The ranks were then dichotomized 
and those in the upper half considered the 
better salesmen and those in the lower half 
were considered the poorer salesmen. The 
same procedure was followed with used-car 
salesmen. 

Intercorrelations (tetrachoric) were com- 
puted between the five measures for new-car 
salesmen and then for used-car salesmen. The 
correlations for both groups were found to be 
sufficiently high that the use of a multiple 
criterion did not appear to be justified. Gross 
earnings was chosen as the sole criterion of 
sales performance because of the ease with 
which the data could be obtained. 

Tenure as a salesman was found to be re- 
lated to the criterion measure. This relation- 
ship was strongest during the earliest months. 


207 


In an effort to minimize the effects of tenure, 
salesmen with less than 12 months on the job 
were eliminated from the analysis. 


Subvarieties of Car Salesmen 


Three pairs of subvarieties of car salesmen 
were chosen for special treatment in the study. 
The choice of two of these pairs was based on 
the opinions of personnel from the operating 
situation and the third was based on previous 
research findings. In the field interviews, a 
number of people from the sales organizations 
expressed strong opinions that new-car sales- 
men appealed to a different kind of buying 
public and employed different sales tech- 
niques than used-car salesmen, and that a 
selection device that was designed to pick 
new-car salesmen would be very unlikely to 
work as well for used-car salesmen. Simi- 
larly, many believed that salesmen who suc- 
cessfully sold Car B were quite different from 
those selling Car A. Car B was more expen- 
sive than Car A and it was generally agreed 
that the prestige factor was more important 
in its sale. 

The third subvariety consisted of salesmen 
of different average ages. The inclusion of 
this subvariety was based on work by Bills 
(3) and Wallace (9) each of whom found 
differences in the validity of selection devices 
when they were applied to subsamples of in- 
surance salesmen with different average ages. 

On these bases then, three pairs of sub- 
varieties were chosen. They were: (a) new- 
car salesmen opposed to used-car salesmen, 
(6) Car A salesmen opposed to Car B sales- 
men, and (c) younger car salesmen (below 
the median age from Sample I) opposed to 
older car salesmen (above the median). 


Item Validity Analyses 


Seven separate item validity analyses were 
conducted for items on Form A as well as for 
Form B. The first of these was referred to 
as the “car salesmen in general” item analy- 
sis. All salesmen from Sample I were di- 
vided into better and poorer salesmen on the 


basis of the criterion. The responses to the 
alternatives to each question on Form A were 
arranged in a j (alternatives) by 2 (criterion 
groups) table and the chi-square technique 
applied. This was repeated for the items on 





208 


Form B using Sample II. Those 40 items 
which had been found to be related most sig- 
nificantly to the criterion were identified. 
Scoring weights of 0, 1, or 2 were assigned to 
the alternatives of these items according to 
the amount their individual cell square con- 
tingencies contributed to the total chi square 
for the item. Those 40 items and their 
weights were referred to as the “Car Sales- 
men in General Key.” 

Sample I and Sample II were then divided 
into those salesmen who sold Car A and those 
who sold Car B. The same item-analysis pro- 
cedure described above was repeated for each 
of these groups separately and resulted in a 
“Car A Key” and a “Car B Key.” 

After Samples I and II were re-assembled 
they were then divided into new- and used-car 
salesmen and the item-analysis procedure was 
repeated yielding a “New-Car Key” and a 
“Used-Car Key.” Once again the samples 
were re-assembled, divided into younger and 
older salesmen, the item analysis repeated, 
and a “Young Car Salesman” and “Old Car 
Salesman” Key resulted. 

The sizes of Samples I and II did not per- 
mit the use of mutually exclusive groups for 
the different item analyses. Also, since the 
various subvarieties of salesmen were always 
a part of the larger “car salesmen in general” 
group, the sizes of their item-analysis samples 
were necessarily smaller than that used for 
the “car salesmen in general” item analysis. 
While this provided an advantage to the “car 
salesmen in general’’ method it was felt to be 
the most realistic condition for investigating 
the problem. 

Since the instrument was to be used as a 
screening device, as part of a more elaborate 
selection procedure, value was placed on keep- 
ing the final questionnaire as brief as pos- 
sible. Toward this end, each of the keys was 
limited to the 40 items that showed the most 
significant relation to the criterion in each of 
the item analyses. 

In summary, seven scoring keys were de- 
veloped. One was developed on the entire 
sample and was called the “Car Salesmen in 
General Key.” Six “Subvariety Keys” were 
developed on various subvarieties of salesmen 
and were named according to the types of 
salesmen involved. 


James E. Kennedy 


Cross-Validation 


Form C consisted of those items appearing 
on the Car Salesmen in General Key or on 
any of the six subvariety keys and was ad- 
ministered to Sample III. Usable question- 
naires were returned by 749 salesmen for 
whom criterion information was available. 

Sample III was split into seven subsamples 
of approximately 95 cases each. Each sub- 
sample was comprised of appropriate sub- 
varieties of car salesmen for evaluating the 
seven keys. For example, Sample 1 was com- 
prised of “car salesmen in general’ and was 
used for cross-validation of the “Car Salesmen 
in General” Key, Sample 2 was comprised of 
Car A salesmen only and was used for cross- 
validation of the Car A Key, Sample 3 was 
comprised of Car B salesmen only, etc. Each 
sample was selected such that it had the pro- 
portions of the various subvarieties of car 
salesmen as existed in Sample III as a whole. 
For example, Sample 2, “Car A” salesmen, 
consisted of 30% Car A, new-car, young 
salesmen; 34% Car A, new-car, old salesmen; 
18% Car A, used-car, young salesmen; 18% 
Car A, used-car, old salesmen. These were 
the same proportions found for all Car A 
salesmen in Sample III. 

The mechanics for achieving this were to 
subdivide Sample III into pools of question- 
naires consisting of each subtype of salesmen, 
e.g., Car A, new-car, younger salesmen; Car 
A, new-car, older salesmen, etc. Random se- 
lection of cases was drawn from these pools 
to fill the quotas for each subsample. 

The questionnaires from each subsample 
were scored with the “Car Salesmen in Gen- 
eral Key.” In addition, the questionnaires 
from the six subvariety subsamples were 
scored with the appropriate subvariety keys. 
The total scores were correlated (point bi- 
serial) against the dichotomized criterion of 
sales performance. 


Results 


The results are summarized in Table 1. 
When the Car Salesmen in General Key was 
applied to an appropriate sample of car sales- 
men in general (Subsample 1) the correlation 
was .31, significant beyond the .01 level. 

The validities of the various Subvarieties 





Selecting Car Salesmen 


Table 1 


Validity Coefficients for Car Salesmen in 
General Key and Subvariety Keys 
r Pbis. 
(Car Sales- 
man in 
General 
Key Used 
Throughout) 


r Pbis. 
(Appro- 
priate 
Subvariety 
Type of Key 
Used) 


Sub- 


samples Salesmen 


1 Car Salesmen a 
in General 

CarA ai 
Car B .30** 
New-Car . a 
Used-Car . 03 
Young ‘ .18* 
Old : .23* 


Cou ee WN 


~ 


* Significant at .05 level. 

** Significant at .01 level. 

Keys ranged from .37 to .07. The strongest 
relationship was found when the Car B Key 
was applied to Car B salesmen (Subsample 3, 
r Pbis. of .37). However, that correlation 
was not significantly different from the value 
of .31 obtained with the Car Salesmen in Gen- 
eral Key on Subsample 1 (¢ = .45). 

The coefficients were transformed to Fisher 
z coefficients for the purpose of averaging. 
The average of the six correlations obtained 
when the subvariety keys were applied to 
Samples 2 through 7 was .22 and the average 
of the correlations for these same samples 
when the Car Salesmen in General Key was 
used was .21. 

Some justification for limiting the keys to 
40 items was found in the following compari- 
son. An alternate Car Salesmen in General 
Key was prepared which consisted of the 40 
items in the Car Salesmen in General Key it- 
self plus the next 20 mest significant items 
for a total of 60 items. This alternate key 
was used to rescore the cases from Subsample 
1 (see Table 1) and the scores correlated with 
the criterion. The correlation with the alter- 
nate key was .29 whereas the original 40 
items key correlated .31. 


Conclusions 


It was concluded that the more elaborate 
procedure of developing specific keys for spe- 
cific subvarieties of car salesmen did not re- 


209 


sult in any significant improvement in validity 
compared with the less elaborate procedure of 
developing a single key for car salesmen in 
general without regard for the various sub- 
varieties of salesmen. 

It is quite possible that if subvarieties other 
than the ones used here had been considered 
for special treatment the results might have 
been different. 

Validity as used throughout this report re- 
fers to “concurrent” validity as contrasted to 
“predictive” validity. The degree to which 
any of these keys would predict sales perform- 
ance when applied to a population of appli- 
cants was not determined. It is quite pos- 
sible that the obtained validities may be dif- 
ferent when the instruments are used with ap- 
plicant populations who complete the ques- 
tionnaire under motivating conditions which 
may be different from those of the salesmen 
used in this study. A program of research 
currently is being planned to evaluate the in- 
strument with applicant populations. 


Received October 31, 1957. 


References 


. Anderson, V. V. Psychiatry in industry. 
York: Harper, 1929, 222-243. 

. Bills, Marion A. Selection of casualty and life 
insurance agents. J. appl. Psychol., 1941, 25, 
6-10. 

. Bills, Marion A. A tool for selection that has 
stood the test of time. In L. L. Thurstone 
(Ed.), Applications of psychology. (Ast ed.) 
New York: Harper, 1952, 131-137. 

. Bolanovich, D. L., & Kirkpatrick, F. H. 
urement and the selection of salesmen. 
psychol. Measmt, 1943, 3, 333-339. 

. Chapple, E. B., & Donald, G., Jr. An evaluation 
of department store salespeople by the Inter- 
action Chronograph. J. Marktg, 1947, 12, 
173-185. 

. Husband, R. W. Techniques of salesmen selec- 
tion. Educ. psychol. Measmt, 1949, 9, 129- 
148. 

. Kornhauser, Arthur W., & Shultz, R. S. Re- 
search on selection of salesmen. J. appl. Psy- 
chol., 1941, 25, 1-5. 

. Stokes, T. M. Selection in a sales organization. 
J. appl. Psychol., 1941, 25, 41-47. 

. Wallace, S. R., Jr. Validities of a selection test 
for different criteria and sample components. 
Amer. Psychologist, 19° 3, 330. (Abstract) 

. Wallace, S. R., Jr., & Twitchell, C. M. Manage- 
ment procedures and test validities. Person- 
nel Psychol., 1949, 3, 277-292. 


New 


Meas- 
Educ. 








Journal of Applied Psychology 
Vol. 42, No. 3, 1958 


An Empirical Comparison of Two Methods of Test Selection 
and Weighting 


C. H. Lawshe and Paul J. Patinka 


Occupational Research Center, Purdue University 


Combining two or more predictor scores 
into a single score is a recurring task of the 
personnel psychologist. Questions concerning 
the number of predictors that can profitably 
be used and the particular weights to be as- 
signed to them underlie this activity. Basic 
to both of these questions is the matter of 
precision of prediction and the economy of 
the practitioner’s time. 

The standard method for selecting predic- 
tors to be included in a battery has been the 
Wherry-Doolittle Test Selection Technique; 
once predictors have been selected they are 
commonly assigned “b” weights through the 
solution of normal equations. In recent years, 
several other selection methods have been pro- 


posed, among them a short-cut method devel- 
oped by Jenkins (1); the same author has also 
proposed an abbreviated weighting method. 
The purpose of this study is: (@) to compare 
the two methods of test selection, and (b) to 
compare the two weighting methods, both in 
terms of computational time and precision of 
results. 


Samples 


Data were available for two companies of 
trainees who attended a Naval Training 
School, which was operated for the purpose 
of training Electrician’s Mates, Third Class. 
These data included scores on the nine psy- 
chological tests (administered prior to admis- 


Table 1 
Correlations of Predictors with GPA and with Each Other 


I H 


G 


F E D Cc 








A 


440 


.568 
512 


422 
499 


448 
.247 


.243 
.235 


.205 
018 


570 
547 





461 600 
463 .287 

346 

401 





Note.—The upper entry in each case refers to Sample 1 while the lower entry refers to Sample 2. 
210 





Two Methods of Test Selection and Weighting 


Table 2 


Comparison of Tests Selected and Multiple R’s Computed by Two Methods on Two Samples of Trainees 


Sample 1 Sample 2 


Wherry-Doolittle 


Test R 





Jenkins 


Test R 


Wherry-Doolittle Jenkins 


Test R 


Test R 


.677 B 653 B 
.752 D 722 D 
.796 A 765 A 


B 677 B 
I .752 A 
A 794 I 


.800 I 99 G 
804 E R I 


F 803 F 
E 810 G 


* Entries on Line 1 are all zero order r's and represent the highest correlation between a single test and the criterion. 


sion) listed below, as well as the earned Grade 
Point Average (GPA) which was used as the 
criterion of efficiency: 


A—The Adaptability Test 

B—Purdue Industrial Training Clas- 
sification Tesi 

C—Navy General Classification Test 

D—Navy Mechanical Ability Test 

E—wNavy Arithmetic Test 

F—Navy English Test 

G—Navy Spelling Test 

H—Navy Radio Aptitude Test 

I—Purdue Electrical Information 
Test 


Test 
Test 


Test 
Test 
Test 
Test 
Test 
Test 
Test 


The number of trainees in Sample 1 and 
Sample 2 were 184 and 176, respectively. 
Table 1 shows the coefficient of correlation of 
each test with the criterion as well as the in- 
tercorrelations between the tests. 


Method 


Procedure for selection analysis. In the case of 
the Wherry-Doolittle Test Selection Technique, the 
standard procedure outlined in Stead and Shartle 
(2) was followed. The procedure followed for the 
short-cut method was that proposed by Jenkins 
(1), using the table supplied by him. It should be 
mentioned that this table was entered directly to the 
nearest values; no interpolations were made. Table 2 
shows the tests selected and the shrunken multiple R 
obtained after the subsequent addition of each test. 

In a practical situation it is not likely that the se- 
lection process would have been continued beyond 
three tests. In the current application, however, the 
process was extended through five variables. After 
five tests were included the differences in the ob- 


tained R’s for Sample 1 and Sample 2 are .006 and 
.003, respectively. 

Discussion. Considering the selection of three 
tests, note that in Sample 1 the two methods se- 
lected identical tests but in a different order while 
in Sample 2, the methods chose the same tests in the 
same order. In Sample 1, the two methods pro- 
duced R’s differing by only .002 while the difference 
in Sample 2 was .007. Neither of these variations 
can be considered to have any practical significance. 

Procedure for weighting. While neither of the 
variations just discussed is of practical importance, 
the two selection methods did choose the three tests 
in a different order in Sample 1. For this reason, 
Sample 1 data were used as a basis for setting up 
weights. Standard score or Beta weights were es- 
tablished by means of the normal equation ap- 
proach. In addition, weights were established using 
the procedure proposed by Jenkins. Each of the 
two sets of weights was in turn converted to 6 or 
raw score weights. To obtain the functioning 
weights in each equation each of the b weights was 
divided by the smallest b weight in that equation, 
resulting in the following equations: 


Jenkins: Composite Score = 1.84 Xs + 1.74 Xa + 
1.00 X; 


Min. Per 


Group cent thet will be superior 


All 7s 


0 10 2 30 40 «50 60 70 80 9 100 


Fic. 1. Wherry-Doolittle institutional expectancy 
chart showing the percentage of Naval electrical 
trainees who will be superior when various com- 
posite score minima are used. 








212 


Wherry-Doolittle: Composite Score = 2.40 Xs + 
2.07 Xa + 1.00 X: 


Holdout group. Test scores were available for a 
third sample of 197 trainees (Sample 3). Both 
equations were independently applied to the scores 
of each subject in Sample 3. Pearsonian r’s were 
then computed between the resulting composite 
scores for each method and the GPA’s. 


Results and Discussion 


The obtained r between GPA and predicted 
composite score for the Wherry-Doolittle 
method was .648, while that for the Jenkins 
method was .646. Figures 1 and 2 are in- 
stitutional expectancy charts based on the 
composite scores computed by each method. 
Note that the greatest discrepancy occurs 
when the “best 40%” and the “best 60%” 
are considered; for example in Fig. 1, a mini- 
mum regression equation score of 115 pro- 
duces 69% “superior,” while in Fig. 2, a mini- 
mum Jenkins composite score of 100 produces 
71% “superior.” Differences of such magni- 
tude certainly have no practical significance 
in industrial or other prediction situations. 

Most important of all, however, is the time 
saving inherent in the two Jenkins procedures. 
The investigators have estimated that the 
Wherry-Doolittle Test Selection Technique 
requires from five to eight times as long as 
does Jenkins’ selection procedure. Jenkins’ 
weighting method can be applied to three 
variables in approximately 10 minutes, in 


C. H. Lawshe and Paul J. Patinka 


Group cent thet will be superior 


116 
108 
100 

93 


© 10 20 30 4 50 60 10 80 9 106 


Fic. 2. Jenkins institutional expectancy chart 
showing the percentage of Naval electrical trainees 
who will be superior when various composite score 
minima are used. 


contrast to the much more time consuming 
normal equation approach. While one cannot 
generalize from one application, it is impor- 
tant to point out that the results reported here 
are completely consistent with Jenkin’s claim. 
It would appear that he has made a major 
contribution to the psychologist confronted 
with applied personnel prediction situations. 


Received December 23, 1957. 
Early Publication. 


References 


1. Jenkins, W. L. An improved short-cut. method 
for multiple R. Educ. psychol. Measmt, 1952, 
12, 316-322. 

2. Stead, W. H., & Shartle, C. L., and Associates. 
Occupational counseling techniques. New 
York: American Book Co., 1940. 

3. Wherry, R. J. A new formula for predicting the 
shrinkage of the coefficient of multiple corre- 
lation. Ann. math. Statist., 1931, 2, 440-451. 








